GithubHelp home page GithubHelp logo

cloudfoundry / cf-deployment Goto Github PK

View Code? Open in Web Editor NEW
291.0 119.0 306.0 10.69 MB

The canonical open source deployment manifest for Cloud Foundry

License: Apache License 2.0

Shell 21.68% Go 78.32%
cff-wg-app-runtime-deployments

cf-deployment's Introduction

cf-deployment

Table of Contents

Purpose

This repo contains a canonical BOSH deployment manifest for deploying the Cloud Foundry Application Runtime by relying individual component releases. It uses several newer features of the BOSH director and CLI. Older directors may need to be upgraded and have their configurations extended in order to support cf-deployment.

cf-deployment embodies several opinions about the CF Application Runtime. It:

  • prioritizes readability and meaning to a human operator. For instance, only necessary configuration is included.
  • emphasizes security by default.
    • CredHub is used to generate strong passwords, certs, and keys. There are no default credentials, even in bosh-lite.
    • TLS/SSL features are enabled on every job which supports TLS.
  • uses two AZs to provide redundancy for most instance groups.
  • uses Diego (source code) by default.
  • deploys jobs to handle platform data persistence using singleton versions of the PXC release for databases and the CAPI release's singleton WebDAV job for blob storage. See the database and blobstore sections of the deployment guide for more information.
  • assumes load-balancing will be handled by the IaaS or an external deployment.

TLS validation

Many test, development, and "getting started" environments do not have valid TLS certificates installed in their load balancers. For ease of use in such environments, cf-deployment skips TLS validation on some components that access each other via the "front door" of the Cloud Foundry load balancer.

Deployers who have valid or otherwise trusted load balancer certificates should use the stop-skipping-tls-validation.yml opsfile to force the validation of TLS certificates for all components.

Deploying CF

Deployment instructions are verbose so we've moved them into a dedicated deployment guide here.

Release Versioning

The Semantic Versioning scheme has been adopted by cf-deployment. A detailed description of how Semantic Versioning is applied to CF-Deployment can be found here.

Contributing to CF-Deployment

Although the default branch for the repository is main, we ask that all pull requests be made against the develop branch.

  • Please fill out the PR Template when submitting pull requests. The information requested in the PR form provides important context for the team responsible for evaluating your submission.
  • Please also take a look at the "style guide", which lays out some guidelines for adding properties or jobs to the deployment manifest.

Before submitting a pull request or pushing to the develop branch of cf-deployment, please:

  1. run ./units/test which interpolates all of our ops files with the bosh cli.
    • By default, the test suite omits semantic tests, which require both jq and yq installed.
    • If you wish to run them, please install these requirements and set RUN_SEMANTIC=true in your environment.
    • Note: it is necessary to run the tests from the root of the repo.
  2. confirm your changes can be successfully deployed with the latest release of cf-deployment and tested with the latest version of CAT's.
  3. If modifying backup and restore, run ./scripts/test which runs a legacy bash suite for backup and restore ops. If you're adding an Ops-file, you will need to:
  4. document it in its corresponding README.
  5. add it to the ops file tests in units/test.

If you're promoting or deprecating Ops-file, please follow Ops-file workflows

Setup and Prerequisites

cf-deployment requires a bosh director with a valid cloud-config that has been configured with a certificate authority. It also requires the bosh CLI, which it relies on to generate and fill-in needed variables.

BOSH director and stemcells

cf-deployment requires both BOSH and Linux stemcells.

BOSH CLI

cf-deployment requires the BOSH CLI.

BOSH cloud-config

cf-deployment assumes that you've uploaded a compatible cloud-config to the BOSH director prior to deploying your foundation.

The cloud-config produced by bbl covers GCP, AWS, and Azure, and is compatible by default.

The iaas-support directory includes tools and templates for building cloud-configs for other IaaSes, including bosh-lite, vSphere, Openstack, and Alibaba Cloud.

For other IaaSes, you may need to do some engineering work to figure out the right cloud config (and possibly ops files) to get it working for cf-deployment.

BOSH runtime-config

cf-deployment requires that you have uploaded a runtime-config for BOSH DNS prior to deploying your foundation. We recommended that you use the one provided by the bosh-deployment repo:

bosh update-runtime-config bosh-deployment/runtime-configs/dns.yml --name dns

Note: BBL v6.10.0 or later will set a runtime config including BOSH DNS when you bbl up.

Deployment variables and CredHub

cf-deployment.yml requires additional information to provide environment-specific or sensitive configuration such as the system domain and various credentials.

To do this in the default configuration, we use CredHub, which is deployed on your BOSH director by default if you are using bbl.

Where necessary credential values are not present, CredHub will generate new values based on the type information stored in cf-deployment.yml.

Note: Since cf-deployment v3.0, CredHub has replaced the now deprecated BOSH vars-store as the default way to store and generate credentials.

Necessary variables that BOSH can't ask CredHub to generate need to be supplied as well.

If the deployment includes only the base manifest (cf-deployment.yml), this is just the system domain. However, some ops files introduce additional variables. See the README summary for the particular ops files you're using for any additional necessary variables.

There are three ways to supply such additional variables:

  1. They can be provided by passing individual -v arguments. The syntax for -v arguments is -v <variable-name>=<variable-value>. This is the recommended method for supplying the system domain.
  2. They can be provided in a yaml file accessed from the command line with the -l or --vars-file flag. This is the recommended method for configuring external persistence services.
  3. They can be stored in CredHub directly with the CredHub CLI. If you do this, then you need follow variable namespacing rules respected by BOSH described here.

Ops Files

The configuration of CF represented by cf-deployment.yml is a workable, secure, fully-featured default. When the need arises to make different configuration choices for your foundation, you can accomplish this with the -o/--ops-file flags. These flags read a single .yml file that details operations to be performed on the manifest before variables are generated and filled. We've supplied some common manifest modifications in the operations directory. More details can be found in the Ops-file README.

The operations subdirectories

These ops-files make changes to most or all instance groups. They can be applied to the BOSH Director's runtime config, or directly to an individual deployment manifest.

The ops-file to configure platform component logging with rsyslog is such an add-on. Please see the Addon Ops-file README for details.

"Community" ops-files are contributed by the Cloud Foundry community. They are not maintained or supported by the Release Integration team. For details, see the Community Ops-file README

"Experimental" ops-files represent configurations that are in the process of being developed and/or validated. Once the configurations have been sufficiently validated, they will become part of cf-deployment.yml and the ops-files will be removed. For details, see the Experimental Ops-file README.

"Test" ops-files are configurations that we run in our testing pipeline to enable certain features.

Contains all the ops files utilized to enable and configure BOSH Backup and Restore (BBR). BBR is a CLI utility for orchestrating the backup and restore of BOSH deployments and BOSH directors. It orchestrates triggering the backup or restore process on the deployment or director, and transfers the backup artifact to and from the deployment or director.

CI

The ci for cf-deployment automatically bumps to the latest versions of its component releases on the develop branch. These bumps, along with any other changes made to develop, are deployed to a single long-running environment and tested with CATs before being merged to main if CATs goes green.

Each version of cf-deployment is given a corresponding branch in the CATs repo, so that users can discover which version of CATs to run against their deployments. For example, if you've deployed cf-deployment v6.10.0, check out the cf6.10 branch in cf-acceptance-tests to run CATs.

The configuration for our pipeline can be found here.

Migrating from Vars Store to CredHub

CredHub is default as of cf-deployment release v If you've got a long running foundation running a release of cf-deployment that relies on vars-store and want to upgrade to a version that's backed by CredHub, you will need to migrate your credentials from vars-store to CredHub. We have a utility to help you migrate.

Can I Transition from cf-release?

CF-Deployment replaces the [manifest generation scripts in cf-release][cf-release-url] which have been deprecated and are no longer supported by the Release Integration team. Although the team is no longer working on or supporting migrations from cf-release to cf-deployment, you can still find the tooling and documentation in the cf-deployment-transition repo.

cf-deployment's People

Contributors

acosta11 avatar anexper avatar ard-wg-gitbot avatar bradylove avatar cdutra avatar changdrew avatar chunyilyu avatar ctlong avatar davewalter avatar dsabeti avatar emalm avatar ericpromislow avatar heyjcollins avatar ishustava avatar jamespollard8 avatar jaresty avatar jochenehret avatar jpalermo avatar jvshahid avatar nimakaviani avatar njbennett avatar robdimsdale avatar selzoc avatar sphawley avatar staylor14 avatar sunjaybhatia avatar tophat8855 avatar vitreuz avatar weymanf avatar zankich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cf-deployment's Issues

sha1 for releases referenced by URL missing in generated manifest

I'm getting Expected SHA1 when specifying remote URL for release 'etcd' - what might I be doing wrong?

Here is what my config.json looks like:

{
  "cf": "integration-latest",
  "etcd": "integration-latest",
  "stemcell": "integration-latest",
  "stubs": ["<redacted>"]
}

and the content of blessed_versions.json actually has those sha1 values:

{
  "releases": [
    {
      "name": "cf",
      "commit": "2eb1e78ae64b4454ff8cd392c79bef25574ffb4c"
    },
    {
      "name": "etcd",
      "version": "18",
      "sha1": "222e26f1f38a23f4355ec2517683fdc7c70704aa",
      "url": "https://bosh.io/d/github.com/cloudfoundry-incubator/etcd-release?v=18"
    },
    {
      "name": "consul",
      "version": "6",
      "sha1": "b9774d0f38235336c2ffb07f762de8177ab9a172",
      "url": "https://bosh.io/d/github.com/cloudfoundry-incubator/consul-release?v=6"
    }
  ],
  "stemcells": {
    "aws": {
      "type": "bosh-aws-xen-hvm-ubuntu-trusty-go_agent",
      "version": "3126",
      "url": "https://bosh.io/d/stemcells/bosh-aws-xen-hvm-ubuntu-trusty-go_agent?v=3126",
      "sha1": "c57c5294a33331d75747bf7593ec8fb822fdd497"
    }
  }
}

After manually adding the sha1: 222e26f1f38a23f4355ec2517683fdc7c70704aa for etcd release 18 to the releases section of the manifest, the deployment got past that check.

This probably happens because ./tools/prepare_deployments doesn't have the sha1: field?

Running out of ephemeral disk on GCP

We're seeing a confusing error message on GCP when trying to update certain jobs:

Response exceeded maximum allowed length

After talking with the BOSH team, the likely culprit is our VM running out of ephemeral disk when trying to untar some stuff. Running df -h on our cc_clock VM shows a tiny 1GB ephemeral partition: /dev/sda3 1.1G 741M 235M 76% /var/vcap/data. On GCP, if you ask for a 5GB root disk, as bbl does for the 5GB_ephemeral_disk vm_extension, bosh will take about 3-4GB of the root disk for the agent and whatnot, and carve an ephemeral partition (/var/vcap/data) out of whatever remains.

Can y'all bump the 5GB_ephemeral_disk root disk size to avoid this issue?

Is uaa.login.client_secret unnecessary?

The job properties for the UAA in cf-deployment.yml includes a uaa.login.client_secret property, but that is no longer a property listed in the UAA spec, and looks to have been removed in cloudfoundry/uaa-release@a33a1f3

Should this be removed from the manifest, and replaces with a login client defined in uaa.clients? Is the login client no longer required in uaa.clients either?

[improvement] Better documentation for stub generation

Would be highly appreciated. So far there is a pretty generic link to http://docs.cloudfoundry.org/deploying/ in the README which doesn't help much. Here is what I'd like to see

Links in Readme to cloud foundry docs are wrong

Hyperlinks are incorrect for the following:

vSphere (not currently supported by this tool)
vCloud (not currently supported by this tool)
OpenStack (not currently supported by this tool)

Clicking on the link leads to:

Page not Found
Visit the homepage.

Retrieving your CF admin password is not obvious

After you've set up your shiny new cf-deployment Cloud Foundry, you'll probably want to login with the cf CLI. Unless you're familiar the new BOSH CLI and the large variables section in cf-deployment, it is not obvious that you should run bosh interpolate --path /uaa_scim_users_admin_password env-repo/deployment-vars.yml to retrieve your CF admin password. An example in the README would be helpful, maybe renaming uaa_scim_users_admin_password to cf_admin_password or similar might also be nice.

No qualifying bean of type [org.springframework.mail.javamail.JavaMailSender] while deploying service into PCF

Hi I am trying to deploy a service which uses Spring Mail to send email. The service is working in my local IDE and I am trying deploy it in PCF cloud. I did a gradle build and have manifest.yml file on the class path. When I push it the app failed start with following exception. Is that something I am missing or PCF behaviour or email service will not in cloud?

I am using spring boot and I have all the required properties which helps to create a bean for mail in my application.properties file.

Caused by: org.springframework.beans.factory.BeanCreationException: Could not autowire field: private org.springframework.mail.javamail.JavaMailSender com.send.SendController.mailSender; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [org.springframework.mail.javamail.JavaMailSender] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}

default branch "develop" can be painful for consumers

As a consumer of cf-deployment who wants a functioning CF as a black box, I was confused to clone the repo and find the deploy broken because the default branch develop is not guaranteed to be stable.

The README does call out that master is the stable branch, but even knowing this it took me a long time to realize the repo I just cloned was on develop.

I understand the main driver for setting develop to be the default branch is to facilitate pull-requests, which generally should be made against develop not master. I know the frustration that can be caused by having to close PRs because they were made against the wrong branch.

I think that setting the default branch to master will provide a better experience for the consumers who want a CF without caring too much about contributing back, so the question becomes which use-case do we want to optimize for?

There's advantages and disadvantages to both choices. What do you think?

Too much stuff in bosh-lite ops file?

I'm trying to deploy Spark (modifying this example) on CF, and think I'll need to leverage container-to-container networking. The simplest way for me to do this is to deploy CF on BOSH-Lite and leverage this cf-networking-release ops file. That ops file assumes there's a MySQL database, but the bosh-lite ops file in this repo replaces MySQL with Postgres.

Is the bosh-lite ops file doing too much? I understand wanting to give bosh-lite users a simple experience, but it seems like swapping out MySQL for Postgres should be its own thing, and asking bosh-lite users to compose a small handful of building blocks isn't that bad (and is encouraged).

/cc @cloudfoundry/cf-container-networking @cppforlife @wendorf @drich10

Create Service Broker Error - certificate verify failed

Hi,
Deploying service broker on cf-deployment, based on this Stark & Wayne blog, throws following error message -

cf create-service-broker haash-broker warreng natedogg https://${broker_url} --space-scoped

Server error, status code: 500, error code: 10001, message: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed

I've successfully created the service broker on PCFDev, so I must be missing something while following the same procedure on cf-deployment.

consul cert empty on bosh lite

On master at ebcd1cb (and develop at SHA: 48eeb42). One instance of the consul job gives the following issues:

bosh deploy output (note, we killed it otherwise it would hang forever)

16:43:17 | Creating missing vms: log-api/b5bf17b2-01a0-4d7f-9649-df81d6a3e190 (0) (00:00:50)
16:43:17 | Creating missing vms: diego-cell/af10dfe4-0e16-4fb8-bcdc-77ea207fdf44 (0) (00:00:50)
16:43:17 | Updating instance consul: consul/4e847ae8-f6e8-4e37-b192-2a63fd374e61 (0) (canary)

consul error output:

/:~# tail -f /var/vcap/sys/log/consul_agent/consul_agent.stderr.log
2017/02/21 16:50:28 [ERR] agent.client: Failed to decode response header: EOF
error during start: timeout exceeded: "rpc error: failed to get conn: remote error: tls: bad certificate"
2017/02/21 16:51:20 [ERR] agent.client: Failed to decode response header: EOF
2017/02/21 16:51:20 [ERR] agent.client: Failed to decode response header: EOF

Looking at the directory:

/:~# cd /var/vcap/jobs/consul_agent/config/certs/
/:/var/vcap/jobs/consul_agent/config/certs# ls -al
total 28
drwxr-xr-x 2 vcap vcap 4096 Feb 21 16:47 .
drwxr-xr-x 3 vcap vcap 4096 Feb 21 16:51 ..
-rw-r----- 1 vcap vcap    1 Feb 21 16:47 agent.crt
-rw-r----- 1 vcap vcap    1 Feb 21 16:47 agent.key
-rw-r----- 1 vcap vcap 1144 Feb 21 16:47 ca.crt
-rw-r----- 1 vcap vcap 1189 Feb 21 16:47 server.crt
-rw-r----- 1 vcap vcap 1676 Feb 21 16:47 server.key

Cannot opt into compiled release after uncompiled release is uploaded

Hi,

We ran into an issue where we were deploying cf-deployment onto bosh-lite without using compiled releases (which is presently just mysql). We later switched to using the use-compiled-releases opsfile, but found it wasn't being utilized; bosh skipped uploading the compiled release and compiled the release anyway.

We spoke with @anEXPer who believed it would work if the uncompiled release was deleted prior to deploying with the compiled release, or perhaps stripping the version attribute from the compiled release in the list of releases in the rendered manifest.

We weren't able to test either of these as compilation succeeded the second time around.

Cheers,

KH && @aashah

Inconsistent use of hyphens and underscores in instance-group names

The names of the instance groups in the cf-deployment.yml manifest are not using separators consistently. Some jobs use hyphens (diego-bbs, diego-brain,diego-cell, route-emitter, tcp-router) while some use underscores (cc_bridge, cc_clock, log_controller). My own preference is for the hyphen to be the separator, but underscore is used more frequently (but not universally) in the job template names and properties.

Thanks,
Eric

Bump releases for Windows operations file in CI

We want to keep getting mileage out of the Windows operations file and it would really help if the releases it includes (garden-windows and the hwc buildpack) were bumped automatically in CI. This would enable all of the consumers of cf-deployment that use windows cells (diego, loggregator, infrastructure, potentially capi and others) to deploy our new BOSH releases.

These releases are on bosh.io here:

certificate error on bosh-lite deployment

Hi,

So I tried following command with bosh v2, as mentioned in README
AH-MacBook-Pro:cf-deployment$ bosh -e lite update-cloud-config bosh-lite/cloud-config.yml

I'm get the following error message

Updating cloud config:
Performing request POST 'https://lite:25555/cloud_configs':
Performing POST request:
Post https://lite:25555/cloud_configs: x509: certificate is valid for *.sslip.io, not lite

Exit code 1

Somewhere I read that I should be using www-192-168-50-4.sslip.io instead of lite (which is mapped to 192.168.50.4 in my /etc/hosts), but then i get

Updating cloud config:
Performing request POST 'https://www-192-168-50-4.sslip.io:25555/cloud_configs':
Performing POST request:
Post https://www-192-168-50-4.sslip.io:25555/cloud_configs: x509: certificate signed by unknown authority

Exit code 1

Help appreciated.

Best Regards

AZ Error on Deploy

Hi,

I'm getting the following error when attempting to deploy using the bosh-deploy-with-created-release CI task from the cf-deployment-concourse-tasks repo and a dev release of consul:

Task 23
01:14:10 | Preparing deployment: Preparing deployment (00:00:00)
            L Error: Instance group 'consul' must specify availability zone that matches availability zones of network 'private'

01:14:10 | Error: Instance group 'consul' must specify availability zone that matches availability zones of network 'private'

Started  Thu Mar 23 01:07:19 UTC 2017
Finished Thu Mar 23 01:14:10 UTC 2017
Duration 00:06:51

Task 23 error

It loos like it might be caused by the extra z3 availability zones in the consul and etcd instance groups in the cf-deployment.yml manifest. Are they supposed to be there?

Create self sign certificate for load balancer

Hi

I was not able to create self sign certificates as explain in the documentation.
First I had to create a CA and then create a certificate. Only in this way I was able to execute bbl create-lbs

Best regards

cc_uploader on cc_bridge failed to start due to port binding issue

We were deploying CF on bosh-lite following the instructions on README and the deployment failed with this error:

10:31:26 | Updating instance diego-cell: diego-cell/14c93c79-ecd8-4bc1-b6bf-db894ea00207 (0) (canary) (00:03:15)
10:48:57 | Updating instance cc-bridge: cc-bridge/5be970c1-0d63-4d60-bb8c-b62b5ce4726e (0) (canary) (00:20:46)
            L Error: 'cc-bridge/0 (5be970c1-0d63-4d60-bb8c-b62b5ce4726e)' is not running after update. Review logs for failed jobs: cc_uploader

10:48:57 | Error: 'cc-bridge/0 (5be970c1-0d63-4d60-bb8c-b62b5ce4726e)' is not running after update. Review logs for failed jobs: cc_uploader

We ssh'ed into the VM and found this repeated several times in /var/vcap/sys/log/cc_uploader/cc_uploader.stdout.log

{"timestamp":"1497265361.568850994","source":"cc-uploader","message":"cc-uploader.ready","log_level":1,"data":{}}
{"timestamp":"1497265361.568930149","source":"cc-uploader","message":"cc-uploader.exited-with-failure","log_level":2,"data":{"error":"Exit trace for group:\ncc-uploader exited with error: listen tcp 0.0.0.0:9090: bind: address already in use\ndebug-server exited with nil\n"}}

And metron process was bound on ::::9090

We ran the deployment again and it succeeded. We were curious why it had failed so we decided to investigate further and found following config files might be responsible for the failure:

  • /var/vcap/jobs/metron_agent/config/metron_agent.json
  • /var/vcap/jobs/cc_uploader/config/cc_uploader_config.json

The cc_uploader specifies port 9090 as the listener address and the the metron defines port 9090 as the health endpoint port.

It would look like there is a race condition as to which process grabs the port first.

Here are the config files:

metron_agent.json

{
  "Index": "5be970c1-0d63-4d60-bb8c-b62b5ce4726e",
  "Job": "cc-bridge",
  "Zone": "z1",
  "Deployment": "bosh-lite.com",
  "IP": "10.244.0.140",
  "Tags": {
    "deployment": "bosh-lite.com",
    "job": "cc-bridge",
    "index": "5be970c1-0d63-4d60-bb8c-b62b5ce4726e",
    "ip": "10.244.0.140"
  },
  "IncomingUDPPort": 3457,
  "DisableUDP": false,
  "PPROFPort": 0,
  "HealthEndpointPort": 9090,
  "GRPC": {
    "Port": 3458,
    "KeyFile": "/var/vcap/jobs/metron_agent/config/certs/metron_agent.key",
    "CertFile": "/var/vcap/jobs/metron_agent/config/certs/metron_agent.crt",
    "CAFile": "/var/vcap/jobs/metron_agent/config/certs/loggregator_ca.crt"
  },
  "DopplerAddr": "doppler.service.cf.internal:8082",
  "DopplerAddrUDP": "doppler.service.cf.internal:3457"
}

cc_uploader_config.json

{
    "cc_ca_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc/ca.crt",
    "cc_client_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc/client.crt",
    "cc_client_key": "/var/vcap/jobs/cc_uploader/config/certs/cc/client.key",
    "consul_cluster": "http://127.0.0.1:8500",
    "debug_server_config": {
        "debug_address": "127.0.0.1:17018"
    },
    "dropsonde_port": 3457,
    "lager_config": {
        "log_level": "info"
    },
    "listen_addr": "0.0.0.0:9090",
    "log_level": "info",
    "mutual_tls": {
        "ca_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc_uploader/ca.crt",
        "listen_addr": "0.0.0.0:9091",
        "server_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc_uploader/server.crt",
        "server_key": "/var/vcap/jobs/cc_uploader/config/certs/cc_uploader/server.key"
    }
}

GCP networking issue

We are creating this issue as a place for conversation about GCP networking issues.

We are seeing requests to the cloud controller fail intermittently. The issue primarily shows up while running CATs. The cf cli does not have a timeout value set on its HTTP client, so it just hangs until the test fails. We can reproduce the issue outside of CATs from anywhere on the internet using the cf cli. Here is the test:

cf login # log into the cf deployment
export CF_TRACE=true
while true; do
    echo "--------------------------------------------------"
    date
    time cf create-org foo
    date
    time cf delete-org -f foo
done

Eventually the cf cli will hang. We then track down the vcap-request-id and when we look at the cloud controller's app and nginx logs we see that a 200 response is written out.

Here are the details of our environment that is showing the issue:

  • CF deployed on GCP using the manifest from this repo.
  • Static IP assigned to a local forwarding rule that has a target pool for the gorouter instances.

Stuff we've tried to help CATs pass:

  • We set the MTU for garden containers to 1460 to prevent packet fragmentation coming from the container running CATs. This turned out to be a red herring since running the cf cli from anywhere on the internet also exposes the issue.
  • We patched the cf cli to have a timeout and a new transport for each request. Since the cf cli has a retry loop this helps CATs still pass when we see these failures. This just masks the underlying issue but helps get us green.

Steps we are taking to fix the problem:

  • We are verifying this problem exists against a newly deployed GCP environment.
  • If we see the issue crop up again we will continue to trace the failed requests. Then next thing we need to look at are gorouter logs and try to find what (if anything) is between the gorouter and the cf cli that might be causing these failures.

Loggregator team members who have context on this issue:

unable to set cf api after updating cf v237 to v241

Hi,

After I migrated cf from version 237 to 241, I'm unable to set api , I get Error performing request: Get https://api.training.cf.redacted.com/v2/info: EOF. this is the first time I'm posting an issue, so I have no idea what information to provide.
ubuntu@cf-ams-training:~$ cf -v
cf version 6.22.1+6b7af9c-2016-09-24

ubuntu@cf-ams-training:~$ uname -a
Linux cf-ams-training 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@cf-ams-training:~$ bosh releases
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'trainee' on 'my-bosh'

+----------------------------+----------------+-------------+
| Name | Versions | Commit Hash |
+----------------------------+----------------+-------------+
| cf | 237 | 87f11091+ |
| | 241* | 638c22f9+ |
| | 244 | e2198e12+ |
| cflinuxfs2-rootfs | 1.15.0* | d4408672+ |
| | 1.33.0 | 15e18f58+ |
| diego | 0.1476.0* | 23caa9d3 |
| | 0.1486.0 | e47f7e29 |
| etcd | 55 | 45730f57+ |
| | 59* | c2bd33fc+ |
| | 70 | d97246d2+ |
| garden-linux | 0.338.0* | 38e53b4a |
| | 0.342.0 | b03a9abc |
| logsearch | 203.0.0* | f85490fb+ |
| logsearch-for-cloudfoundry | 200.0.0+dev.1* | 170bbb31 |
| postgres | 1.0.3 | 71dfd61b+ |
+----------------------------+----------------+-------------+

ubuntu@cf-ams-training:~$ bosh deployments
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'trainee' on 'my-bosh'

+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| Name | Release(s) | Stemcell(s) | Cloud Config |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| logsearch | logsearch-for-cloudfoundry/200.0.0+dev.1 | bosh-openstack-kvm-ubuntu-trusty-go_agent-raw/3215 | none |
| | logsearch/203.0.0 | | |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| openstack-training-AMS | cf/241 | bosh-openstack-kvm-ubuntu-trusty-go_agent-raw/3215 | none |
| | etcd/59 | | |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| openstack-training-AMS-diego | cf/241 | bosh-openstack-kvm-ubuntu-trusty-go_agent-raw/3215 | none |
| | cflinuxfs2-rootfs/1.15.0 | | |
| | diego/0.1476.0 | | |
| | etcd/59 | | |
| | garden-linux/0.338.0 | | |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+

Ops File for compiled releases

Would it be possible to have opsFiles to replace the releases section with compiled releases per IaaS(or atleast for GCP).

Currently multiple teams are waiting for the same releases to be compiled, is this project a good place for such a file to live?

When using a path like '/Users/pivotal/workspace/cf-release' in the config.json, bosh complains about missing 'file://' in the manifest

I tried to follow the documentation for using tools/prepare_deployments and when my config file contained an absolute path to cf-release (like in the README example), this resulted in a manifest entry like url: /Users/pivotal/workspace/cf-release. This doesn't play well with bosh, which expects a schema, in this case file:// would be appropriate.

I couldn't find the right place to patch this in your shellscripts, sorry.

Anchors, instead of links, make IPs difficult to override.

Hi there!

I'm trying to create an override file which will allow for a minimal, instead of HA deployment.

I'm including lines like these to slim down the static IPs specified:

- type: remove
  path: /instance_groups/name=consul/networks/name=private/static_ips/2
- type: remove
  path: /instance_groups/name=consul/networks/name=private/static_ips/1
- type: remove
  path: /instance_groups/name=nats/networks/name=private/static_ips/1

... unfortunately, the sets of two and three IPs continue to show up all throughout the manifest. That doesn't seem to affect deployment, but it's messy. Here's an example:

          servers:
            lan:
            - 10.0.31.190
            - 10.0.47.190
            - 10.0.63.190

A certain @cppforlife suggested that it's because you're using YAML anchors, instead of BOSH links.

metron_agent has an implicit dependency on consul_agent that isn't satisfied in all instance groups

Because of the properties defined here: https://github.com/cloudfoundry/cf-deployment/blob/master/cf-deployment.yml#L19-L26 that use etcd.service.cf.internal the metron_agent job ends up having a dependency on the consul_agent job.

We found that the following instance_groups don't satisfy that dependency:

Expose several CF properties as links?

We are producing a release which is meant to be deployed alongside CF. It's all BOSH-2.0-style work. Right now, to make this work, we need an Operator to copy and paste several values from the CF manifest to a variables file. To do this, we made a template with the following section:

########################
# Cloud Foundry config #
########################

cf_api_url: https://api.bosh-lite.com
cf_uaa_admin_client_secret: admin-secret
cf_admin_username: admin
cf_admin_password: admin
cf_app_domains: [bosh-lite.com]
cf_sys_domain: bosh-lite.com
cf_skip_ssl_validation: true
cf_nats:
  machines: [10.244.0.6]
  user: nats
  password: nats
  port: 4222

As service authors, I don't think we should expect our Operator to copy and paste these values from their cf-manifest.

So this request, then is to please expose these features as links. I can understand the hesitation around some of them, especially UAA's admin password, but until credhub can be the solution to some of these issues, this is the best a service author has. After all, if the Operator already has access to all manifests, allowing these links isn't a significant additional exposure.

DNS for SSH Proxy

I looked through this repo's README, https://docs.cloudfoundry.org, docs and README for capi-release, and docs and README for bosh-bootloader, and couldn't find any instructions on how to set up DNS for the load balancer in front of the SSH Proxy instances.

Digging through CC code I found that it makes a hard-coded assumption that you have ssh.SYSTEM_DOMAIN DNS setup to point to your SSH Proxy LB (or the SSH Proxies directly if you're skipping the LB).

It would be nice if cf-deployment explicitly asked the user for system, app, and SSH domains as the only things the user needs to provide for a basic deployment. Basically, the user needs to be responsible for DNS and ingress, and cf-deployment can do the rest. So by the same token, cf-deployment ideally wouldn't assume how the user has chosen to set up that DNS.

What if:

  • capi-release made this a required property in the job spec
  • cf-deployment made it a required input from the user
  • docs.cloudfoundry.org explicitly explained the DNS requirements for setting up an environment, and
  • bosh-bootloader and cf-deployment referred user to those official docs

?

/cc @zrob @evanfarrar @wendorf @drich10

cf-deployment.yml not compliant with YAML spec

Hi,

I'm developing a tool, for my own educational use, to parse cf-deployment yaml file and visualise the components and dependencies. My tool was parsing the yaml file correctly until Dec 2016. Coming back to the project after few months I've noticed following error message with the latest clone of cf-deployment,

Exception in thread "main" found undefined alias diego_bbs_client_properties
bbs: *diego_bbs_client_properties
in 'reader', line 727, column 16:
^

With some analysis I found that the yaml alias is being referenced before it's declaration, which is not compliant with YAML spec - according to various YAML parsers and lint tools available online.

Alias in
Line 727: bbs: *diego_bbs_client_properties
whereas the declaration is in
Line 800: bbs: &diego_bbs_client_properties Line 801: ca_cert: "((diego_bbs_client.ca))" Line 802: client_cert: "((diego_bbs_client.certificate))" Line 803: client_key: "((diego_bbs_client.private_key))"
Switching the declaration above Line 727 fixes the problem. I'm looking for alternative YAML parsers (written in Java) that can overlook the issue of declaration order, if not then I hope the cf-deployment team can comment on this issue.

cf login password for bosh-lite + cf-deployment

Hi,

The default cf login credential for cf-release on bosh-lite (admin / admin) is not working with cf-deployment. I understand that there no default credentials and secrets are loaded from deployment-vars.yml file.

Which key inside the deployment-vars.yml file holds the password for cf login -a api.bosh-lite.com --skip-ssl-validation command?

Regards,
Amitoj

Unclear documentation

In your README, under the Setup and Prerequisites -> Bosh Cloud Config section, it mentions that there are IAAS-specific advice on how to set up a cf-deployment-compatible cloud config on different IAASes. However, no such documentation appears to exist under the Setup and Prerequisites.

Update routers in serial

Hi,

Issue : Currently cf-deployment rolls are routers in parallel and it causes downtime for backends.

Possible fix: To fix this issue we need to update routers in serial and roll this vm after UAA(to fetch oauth tokens for routing API).

Routing team has CI coverage for zero downtime tests and these are intermittently failing since we moved to cf-deployment and it would be great if your team can roll out the fix.

Related PR : #87

Regards
Shash

Long MariaDB compilation

I am observing two packages compilation takes about 1 hour on c3.large machines:

11:54:47 | Compiling packages: mariadb/563c214c66c68a3558312fee44c22c30085a663a (00:25:17)
12:20:04 | Compiling packages: xtrabackup/44b8b474086ddbc45a7797c191449da8806ee9d1 (00:27:36)

It is too long - for other packages it takes only several minutes to compile.

What is the reason for such long timing and can we make it shorter?

Possible ideas:

  1. Make compilation parallel inside single package
  2. Start compilation of these two packages first thing in the deployment - now they are the last ones.
  3. Run compilation of these two packages in parallel - currently they are compiling sequentially though I have 6 compilation vms.

Duplicated variables for loggregator doppler cert?

This commit introduced new variables loggregator_tls_doppler_cert but the manifest already had a variable doppler_tls_server_cert.

I think these ought to be the same thing.

Loggregator's own scripts only generate 3 cert/key pairs, not 4. And when we do a deployment and set loggregator_tls_doppler_cert to be different from doppler_tls_server_cert then metron fails to start. But apparently when hermione is deployed using identical values for these two variables, then the deploy succeeds.

cc: @mcwumbly

CATS errand missing

The cf-release has an acceptance-test errand that we currently use to make sure our team's changes don't break our test CF deployment. It'd be great if cf-deployment could also (optionally?) provide this errand. For now we're having to upload all of cf-release just to be able to add it back to our deployment.

`name:` missing after generating manifest

The aws stub as described in the documentation is missing meta.environment which leads to Deployment name not found in the deployment manifest when actually deploying the result. I don't know why this would work for people not using the cf-deployment toolchain, but probably this should be fixed in the documentation then?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.