segmentio / stack Goto Github PK

View Code? Open in Web Editor NEW

2.1K 99.0 417.0 348 KB

A set of Terraform modules for configuring production infrastructure with AWS

Home Page: https://open.segment.com

License: MIT License

Makefile 1.31% HCL 69.21% Shell 5.55% Smarty 0.41% Dockerfile 0.93% Python 22.60%

stack's Introduction

⚠️ Unmaintained ⚠️

This repository is unmaintained, but left as a historical relic for any wishing to adapt it. Godspeed!

Segment Stack

The Segment Stack is a set of Terraform modules for configuring production infrastructure with AWS, Docker, and ECS. It's a more 'curated' set of defaults for configuring your AWS environment, while still allowing you to fully customize it.

To get more background on the Segment Stack you can read this blog post about its history.

The Stack comes with:

an auto-scaling group of instances to run your services
a multi-az VPC with different subnets for availability
self-managed services run via docker and ECS
an ELB and ECS definition for each service
docker logs that populate in CloudWatch
a bastion node for manual SSH access
automatic ELB logging to S3

Start from scratch or selectively add it to your existing infrastructure, the Stack is yours to customize and tweak.

Quickstart

To run the stack, you'll need AWS access and Terraform installed, check out the requirements section.

The easiest way to get the Stack up and running is by creating a Terraform definition for it, copy this snippet in a file named terraform.tf:

module "stack" {
  source      = "github.com/segmentio/stack"
  environment = "prod"
  key_name    = "my-key-name"
  name        = "my-app"
}

This is the base configuration, that will provision everything you need to run your services.

From there, you'll want to plan, which will stage the changeset

$ terraform plan

And if the changes look good, apply them to your infrastructure

$ terraform apply

This will automatically setup your basic networking configuration with an auto-scaling default cluster running ECS.

Now that we've got all the basics setup, how about adding a service?

Services pull images from Docker Hub and then run the images as containers via ECS. They are automatically discoverable at <service-name.stack.local> and will run with zero-downtime deploys. We can can use the stack//service module to automatically provision all of the required parts of the service, including a load balancer, ECS service, and Route53 DNS entry.

Here's a sample service definition, try adding it to your terraform.tf file.

module "nginx" {
  # this sources from the "stack//service" module
  source          = "github.com/segmentio/stack//service"
  name            = "my-app"
  image           = "nginx"
  port            = 80
  environment     = "${module.stack.environment}"
  cluster         = "${module.stack.cluster}"
  iam_role        = "${module.stack.iam_role}"
  security_groups = "${module.stack.internal_elb}"
  subnet_ids      = "${module.stack.internal_subnets}"
  log_bucket      = "${module.stack.log_bucket_id}"
  zone_id         = "${module.stack.zone_id}"
}

Once the nginx service has been added, simply run another plan and apply:

$ terraform plan
$ terraform apply

Your service should automatically be up and running. You can SSH into your bastion host (find the ip by running terraform output) and connect using the service name:

$ ssh -i <path-to-key> ubuntu@<bastion-ip>
$ curl http://nginx.stack.local/

The bastion IP should have been shown by the terraform output when it created the stack for the first time. If you missed it you can still get it from the AWS console.

Requirements

Before we start, you'll first need:

an AWS account with API access
locally configured AWS credentials or a tool like aws-vault
to create a keypair in AWS
Docker images of your services uploaded to Docker Hub
download and install terraform

Architecture

At a high level, the Stack creates a multi-az VPC, a single auto-scaling cluster, and service definitions within ECS.

Your instances are automatically distributed across the VPC, addresses are translated by NAT gateways, and services are all discoverable via route53 and ELBs.

We'll walk through how each of these fit together in this architecture document.

Networking

By default, the Stack will create a VPC in a single region, amongst multiple availability zones (AZs). The default mask for this VPC is

10.30.0.0/16

The address was chosen to be internal, and to not conflict with other pieces of infrastructure you might run. But, it can also be configured with its own CIDR range.

Each availability zone will get its own external and internal subnets. Most of our infrastructure will live in the internal subnet so that they are not externally accessible to the internet.

If you'd like to scale to multiple regions (outside the scope of the current stack), simply add one to the second octet.

10.31.0.0/16 -- my new region

To span across availability zones, the regional 16-bit mask becomes 18-bits.

10.30.0.0/18 - AZ A
10.30.64.0/18 - AZ B
10.30.128.0/18 - AZ C
10.30.192.0/18 - Spare

To subdivide each availability zone into spaces for internal, external and to have spare room for growth; use a 19-bit mask for internal, and a 20-bit mask for external. The external space is smaller because only a few instances and load-balancers should be provisioned into it.

10.30.0.0/18 - AZ A

  10.30.0.0/19 internal
  10.30.32.0/20 external
  10.30.48.0/20 spare

10.30.64.0/18 - AZ B

  10.30.64.0/19 internal
  10.30.96.0/20 external
  10.30.112.0/20 spare

10.30.128.0/18 - AZ C

  10.30.128.0/19 internal
  10.30.160.0/20 external
  10.30.176.0/20 spare

The VPC itself will contain a single network gateway to route traffic in and out of the different subnets. The Stack terraform will automatically create 3 separate NAT Gateways in each of the different subnets.

Traffic from each internal subnet to the outside world will run through the associated NAT gateway.

Alternatively, setting the use_nat_instances VPC module variable to true, will use EC2 NAT instances instead of the NAT gateway. NAT instances cost less than the NAT gateway, can be shutdown when not in use, and may be preferred in development environments. By default, NAT instances will not use Elastic IPs to avoid a small hourly charge if the NAT instances are not running full time. To use Elastic IPs for the NAT instances, set the use_eip_with_nat_instances VPC module variable to true.

For further reading, check out these sources:

Instances

Each instance in an ecs-cluster is provisioned using an AMI built in the ./packer directory. By default, this AMI is based off the 16.04 Ubuntu LTS image, and runs all the base programs within systemd.

After boot, systemd will run each of its targets, which includes booting Docker and the ECS agent. The ECS agent will register the instance with a particular cluster, pulled from the environment variables on the instance.

Services

Stack services run within ECS. They include a few key pieces:

an ECS task definition
an ECS service definition
an internal ELB
an internal route53 entry

The task definition tells ECS what docker image to run (nginx), and how to run it (env vars, arguments, etc). The service definition tells ECS how many containers of a task to run, and on which cluster to run the containers. The ELB routes traffic to the containers in a service, and route53 assigns a 'nice' name to the ELB.

Service discovery works via vanilla DNS. Whenever a service is provisioned, it will also create an accompanying ELB that routes to the containers in the service. The route53 entry for the ELB provisioned by the 'auth' service would be:

$ curl http://auth.stack.local

For more complicated service discovery which handles cases like versioning, we'd recommend using a service like Consul or etcd.

Bastion

The bastion host acts as the "jump point" for the rest of the infrastructure. Since most of our instances aren't exposed to the external internet, the bastion acts as the gatekeeper for any direct SSH access.

The bastion is provisioned using the key name that you pass to the stack (and hopefully have stored somewhere). If you ever need to access an instance directly, you can do it by "jumping through" the bastion:

$ terraform output # print the bastion ip
$ ssh -i <path/to/key> ubuntu@<bastion-ip> ssh ubuntu@<internal-ip>

Logging

The default AMI that instances of the ECS cluster are running ships with the ecs-agent and a program called ecs-logs pre-configured. While ecs-agent takes care of scheduling services, ecs-logs is in charge of reading the service logs and uploading them to CloudWatch. This is all configured automatically by the default Stack settings.

ecs-logs creates one CloudWatch Logs Group for each service, then in each of the groups, a CloudWatch Logs Stream named after the docker container running the service will hold all the logs generated by the service.

If you're interested in digging further into how ecs-logs work here is the github repository where it's hosted:

https://github.com/segmentio/ecs-logs

Module Reference

To see the full reference for each individual module, see our reference page.

You can reference modules individually by name:

module "vpc" {
  source             = "github.com/segmentio/stack//vpc"
  name               = "${var.name}"
  environment        = "${var.environment}"
  cidr               = "${var.cidr}"
  internal_subnets   = "${var.internal_subnets}"
  external_subnets   = "${var.external_subnets}"
  availability_zones = "${var.availability_zones}"
}

Developing

You can customize any part of the stack you'd like.

AMIs

All of the default AMIs that ship with stack are build using packer. If you'd like to build your own, you can make changes to the ./packer directory and then re-build using:

$ make amis

Terraform

Stack is all vanilla Terraform and AWS, so you can customize it by simply forking the repository and referencing your own modules internally.

Examples

To dig further down into what you can build with the Segment Stack we have put together an example app that shows how to configure a small infrastructure from scratch:

https://github.com/segmentio/pingdummy

Authors

License

Released under the MIT License

(The MIT License)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

stack's People

Contributors

Stargazers

Watchers

Forkers

pwhittlesea moondev deanmraz olenm rtvt123 willnewby natacado velebak cyberis btd mthenw lynnhook petrosrizos dimdung skahack chatu skel84 georgehaidar mm53bar tracker-common moio fatal-dose davande stdevops trouva aks585 kariusdx builddirect akashiclabs chaunceyt shidel-dev kchugalinskiy contently tharanga-abeyseela ewestern lmamsen eplowe jszalkowski mattschofield mario-harper-volusion mobomo yakaas asiemer lifeifei rasheedamir elbuo8 ravenac95 rainkinz xueshanf knott-sphere cdsalmons david-kli izogain grzegorzlisowski sysbot edencia ryanmaclean wrasdf sent2020 jstallings phatforge filiptepper gkedzierski dekz rbramwell idris trunkclub swerling kc-dot-io 0xd3adb33f happyferret 4tius tk999-zz alani rnrbarbosa vkulov ewilde chadjessup scastros toshisam flochaz sgzijl rafagnin ohoeltke giovanniferriwp icemanncsu analyticalflavorsystems jalessio osalkk rymohr harmy snormore letsbuilders liveauctioneers mawaldne-surge mawaldne liquidairstudios zikphil andersonkyle seamusv

stack's Issues

Amazon registry

I setup amazon container registry and specified policy to allow anyone to pull/push images.
I was able to push image to the registry.
However when containers fails to start because image not found error.
I saw reference how to enable private registries like docker hub, do you know by any chance how to enable amazon registry? I assumed that they would work out of the box since it's I'm using amazon ecs optimized amis.

EntityAlreadyExists: Instance Profile ecs-instance-profile-stack-prod already exists

I'm receiving the following error:

Error creating IAM instance profile ecs-instance-profile-stack-prod: EntityAlreadyExists: Instance Profile ecs-instance-profile-stack-prod already exists.

Deleting the IAM profile doesn't fix the issue, as it's regenerated and fails each time.

To be more specific, this happens when I run terraform apply and my terraform.tf file looks like this:

module "stack" {
  source      = "github.com/segmentio/stack"
  environment = "prod"
  key_name    = "key"
  name        = "stack"
}

Any thoughts? Would love to get this up and running 👍 I'm using the blog as a tutorial :)

Error when running 'terraform plan' when adding nginx module from quickstart

I got past the first couple of steps, but when I add in the nginx module and try terraform plan, I get the following error:

Error configuring: 6 error(s) occurred:

* 1:3: unknown variable accessed: var.security_groups in:

${var.security_groups}
* 1:3: unknown variable accessed: var.log_bucket in:

${var.log_bucket}
* 4:22: unknown variable accessed: var.port in:

  [
    {
      "containerPort": ${var.container_port},
      "hostPort": ${var.port}
    }
  ]

* 1:3: unknown variable accessed: var.port in:

${var.port}
* 1:3: unknown variable accessed: var.subnet_ids in:

${var.subnet_ids}
* 1:3: unknown variable accessed: var.zone_id in:

${var.zone_id}

I'm using Terraform v0.6.16

subnet_ids expect string but is list

I get the following error when creating a web-service module.

Error configuring: 1 error(s) occurred:

* variable subnet_ids in module browser_client should be type string, got list

Go the configuration from pingdummy. Unsure if this is a Stack issue or pingdummy but I thought I would report it regardless. Changed my config to join by , which seemed to fix the issue.

subnet_ids       = "${join(",",module.stack.external_subnets)}"

S3 Bucket name already in use

Testing the stack I received the error that the S3 Bucket name was already in use. I went through most of the code and couldn't find where to change it. Any help would be greatly appreciated.

Repeated prompts for region variable

With this config:

module "stack" {
  source      = "github.com/segmentio/stack"
  name        = "mystack"
  environment = "prod"
  key_name    = "ml623"
  region      = "us-east-1"
  availability_zones = "us-east-1b,us-east-1c"
}

I get prompted 8 times for the region! I answer the promts with ENTER, or entering the region name, seems to behave the same way. Do I need to add some more config elements?

 » terraform -v   
Terraform v0.6.16

 » terraform apply
provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value: 

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value: 

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value: 

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value: 

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value: 

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value: 

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value: 

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value:

Private subnet mistake on networking architecture diagram

In the README there's a diagram of the network subnet architecture:

However, there's a mistake with the private subnet addresses in us-west-2b and us-west-2c. They should read 10.30.64.0/19 and 10.30.128.0/19 respectively. I can't find the original file to correct, only the png is in the repo.

If someone can point me to the original editable file I can submit a PR.

ECS-Cluster Module Doesn't Use Name Variable When Adding Instances to Cluster

The ecs-cluster module doesn't use the name variable when adding EC2 instances to the cluster. All instances are added to the default cluster.

The ecs-agent uses the ECS_CLUSTER environment variable to determine which cluster to associate the instance with. When that isn't provided, it joins the default cluster by default.

Question: Persistent volumes in services

I am wondering how you guys attach volumes to your services. Based on the stack tasks module, I see the volumes nodes defined but are not exposed or used.

Are you just running stateless containers and haven't needed to mount volumes or have found a different way of doing this?

Questions about managing service/task revisions in combination with continuous deployment

I'm not sure if this is the right place to ask these questions, but since there is no comments section on the blog post I figured this may be the best place to ask.

As the title suggests my questions are about continuous deployment of services/tasks. Currently I have a similar setup to yours, but ended up not managing the ECS services and tasks through terraform because I wanted to continuously deploy my applications. So I build a docker container for each commit to the master, tag it with a version and update the task definition/service accordingly through the cli.

Now I'm curious how you guys do this or would do this? As it looks that you track the services and task revisions through terraform and updating the terraform files for every version is not really practical if deploying this frequently.

Opinionated fork

Hello,

First, cheers on such a great project! I have learned a ton just from reading the terraform structure y'all have created.

I have created a customized fork for my own needs. I want to make sure I do everything on the up and up licensing wise. I believe the main requirement is to keep your original Licence.md at the root of my repository, and maintain any copyright notices that you have (I'm not sure what these would be, I haven't come across any yet).

Additionally, I have given attribution at the top of my README, as I am very appreciative of the work y'all have done with this project.

Am I missing anything else?

Thanks!

Errors from basic terraform config

Having a tough time getting Stack deployed. I'm working with the absolute basic terraform config:

module "stack" {
  source      = "github.com/segmentio/stack"
  environment = "prod"
  key_name    = "id_rsa"
  name        = "my-app"
}

I've downloaded the terraform binary (v0.7.13), added my public key to us-west-2, run terraform get and terraform plan all without problems. My AWS credentials are added as environment variables. Then I tried terraform apply.

After 10 minutes of trying to create the autoscaling cluster, it gives up and reports the following errors:

Error applying plan:

2 error(s) occurred:

* aws_s3_bucket.logs: Error creating S3 bucket: BucketAlreadyExists: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
        status code: 409, request id: 4BB52027C19E648C
* aws_autoscaling_group.main: "my-app": Waiting up to 10m0s: Need at least 3 healthy instances in ASG, have 0

This is running on a completely clean new AWS account, so there are no resources kicking around from old projects that could be conflicting.

variable security_groups in module db should be type list got string

Issue was opened on example repo segmentio/pingdummy#3 A I coping it here in hope someone will help me :) Thanks

ecs_cluster my bootcmd not executed in instance

I have a issue with bootcmd, the user data https://github.com/segmentio/stack/blob/master/ecs-cluster/files/cloud-config.yml.tpl is not added to my instance.
I checked user data via aws console all params defined by ecs_cluster are there, but when I ssh to instance and check /etc/ecs/ecs.config there is no this data.
Here is a output

DOCKER_HOST=unix:///var/run/docker.sock
ECS_LOGLEVEL=warn
ECS_LOGFILE=/ecs-agent.log
ECS_CHECKPOINT=true
ECS_DATADIR=/data
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1h
ECS_AVAILABLE_LOGGING_DRIVERS=["journald"]

This causes a lot of problems, cant add instance to custom cluster, cant pull images from dockerhub etc...
I would really appreciate any suggestion, thanks!

estimated pricing

This looks so sweet :-)

Not sure if i missed this somewhere, but it'd be great to get a sense of the base cost without running any services.

CSV in some of the modules over lists

I noticed in some of the modules we are still using comma separated values that are then split later. Some of the modules started converting these to the native list type. Just curious if there is a reason they all weren't converted. Are some uses better than others for CSV over list?

Quickstart not working

Hello,

I just wanted to try the "Quickstart" but got the following errors when running "terraform apply" command:

Error applying plan:

7 error(s) occurred:

aws_subnet.internal.1: Error creating subnet: InvalidParameterValue: Value (us-west-2b) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: us-east-1a, us-east-1b, us-east-1d, us-east-1e.
status code: 400, request id: 18263fde-a8ef-4696-a08a-f0f4c7a76856
aws_subnet.internal.2: Error creating subnet: InvalidParameterValue: Value (us-west-2c) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: us-east-1a, us-east-1b, us-east-1d, us-east-1e.
status code: 400, request id: 30d9c3c5-ab70-4607-ac53-727e89b99d95
aws_subnet.internal.0: Error creating subnet: InvalidParameterValue: Value (us-west-2a) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: us-east-1a, us-east-1b, us-east-1d, us-east-1e.
status code: 400, request id: eabdf1f0-0e82-410e-843c-cf61968af01e
aws_subnet.external.1: Error creating subnet: InvalidParameterValue: Value (us-west-2b) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: us-east-1a, us-east-1b, us-east-1d, us-east-1e.
status code: 400, request id: 692192b3-822b-4149-9294-ac710b353586
aws_subnet.external.2: Error creating subnet: InvalidParameterValue: Value (us-west-2c) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: us-east-1a, us-east-1b, us-east-1d, us-east-1e.
status code: 400, request id: 38847188-311a-4edb-9c76-3a2782087f6f
aws_subnet.external.0: Error creating subnet: InvalidParameterValue: Value (us-west-2a) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: us-east-1a, us-east-1b, us-east-1d, us-east-1e.
status code: 400, request id: 3885f22e-823d-4d51-aa7b-1884feaf3dcd
aws_launch_configuration.main: InvalidAMIID.NotFound: The image id '[ami-f3985d93]' does not exist
status code: 400, request id: dc5aebf9-3abd-4232-a45d-71f75a3f5654

Thanks.

[Question] What subnet would a worker service be launched within?

I'm not sure how the ECS scheduler decides what subnet to pick for launching a service but I'm guessing it dependes on the subnets that have been configured for the ELB of the service. However, in the case of a worker, which does not have an ELB configured, how can it be guaranteed that it would be launched within an internal subnet as opposed to an external one?

ECS Cluster does not support us-east-2

module.ecs_cluster.data.template_file.cloud_config: Refreshing state...
Error refreshing state: 9 error(s) occurred:

Not a valid region: us-east-2

Attach ECS ASG to Web-Service ELB

I'm not sure it's very clear how to achieve this. I have a web-service defined which spins up an ELB. The problem is that it comes up without any instances attached to it. I have to manually go to the ELB and add the ECS instances to it. I wanted to use the aws_autoscaling_attachment_resource but the right output variables arent' defined for me to leverage it. Ideally I would think it would be something like this:

# Create a new load balancer attachment
resource "aws_autoscaling_attachment" "asg_attachment" {
  autoscaling_group_name = "${module.stack.asg}"
  elb                    = "${module.myapp.elb}"
}

Am I doing this wrong or is this functionality missing?

Thanks!

Error parsing JSON: invalid character '$'

Is the bellow error, an error ?

» terraform -v                         
Terraform v0.6.16

provider "aws" {
  region = "us-west-1"
}

module "stack" {
  source      = "github.com/segmentio/stack"
  name        = "mystack"
  environment = "prod"
  key_name    = "ml623-2"
  availability_zones = "us-west-1a,us-west-1c"
}

+ module.stack.s3_logs.aws_s3_bucket.logs
    acl:              "" => "private"
    arn:              "" => "<computed>"
    bucket:           "" => "mystack-prod-logs"
    force_destroy:    "" => "false"
    hosted_zone_id:   "" => "<computed>"
    policy:           "" => "Error parsing JSON: invalid character '$' looking for beginning of value"
    region:           "" => "<computed>"
    tags.#:           "" => "2"
    tags.Environment: "" => "prod"
    tags.Name:        "" => "mystack-prod-logs"
    website_domain:   "" => "<computed>"
    website_endpoint: "" => "<computed>"

Null zone_id error, missing requirement?

While following along with the pingdummy example and some of your documentation, I've been running into an issue where terraform plan and terraform apply both fail due to the error:

* variable "main" is nil, but no error was reported

I was able to determine the ambiguous error is caused by ${module.stack.zone_id} missing while defining a service. An example portion of my terraform.tf:

module "stack" {
  source      = "github.com/segmentio/stack"
  name        = "my_clients_name"
  environment = "prod"
  key_name    = "bastion-ssh"
  region      = "us-east-1"
  availability_zones = "us-east-1c,us-east-1d,us-east-1b,us-east-1e"
}

module "my_rails_service" {
  source         = "github.com/segmentio/stack/service"
  image          = "${aws_ecr_repository.my_rails_service.repository_url}"
  port           = 3000
  environment     = "${module.stack.environment}"
  cluster         = "${module.stack.cluster}"
  iam_role        = "${module.stack.iam_role}"
  zone_id         = "${module.stack.zone_id}" <----- cause of the error 
  security_groups = "${module.stack.internal_elb}"
  subnet_ids      = "${module.stack.internal_subnets}"
  log_bucket      = "${module.stack.log_bucket_id}"
  command = "bundle exec rails s -p 3000 -b '0.0.0.0'"
}

When digging down deep enough, I find that the missing field is an output of the Terraform AWS ELB Resource. From the examples it seems like the default setup should be able to compute the zone_id, am I missing something? (I fiddled around with the DNS module example in pingdummy for awhile as well, but no luck)

[Question] ECS cluster with public access

Hey,

I saw #19 but I'm still not sure how can I make those hosts publicly accessible without bastion host. Security group from here provide access only for other SGs https://github.com/segmentio/stack/blob/master/ecs-cluster/main.tf#L117-L127

Tests Don't Include Nested Modules

The following expression is currently used to identify Terraform files within each Module folder:

modules=$(ls -1 */*.tf | xargs -I % dirname %)

bastion
defaults
dhcp
dns
ecs-cluster
elb
iam-role
iam-user
rds-cluster
s3-logs
security-groups
service
task
vpc
web-service
worker

The issue is that this doesn't include Terraform files within nested Module folders (e.g. web-service/elb)

(Not sure why there are two elb modules anyway...)

service and web-service fail when using ECR

If you try to setup a service or web-service using an image from ECR, Terraform fails with errors:

variable "main" is nil, but no error was reported

The reason is because the name is derived from image, replacing / with -. This works for names like segmentio/pingdummy, but when the image is hosted on ECS the image comes in the format aws_account_id.dkr.ecr.region.amazonaws.com/my-web-app. Those dots in the URL cause Terraform to fail.

Current workaround: Just specify a name when defining your service.

[Question] How to manage vm/cluster resource for non ecs tasks?

Hi Humans, thanks for open sourcing this repo :).
I would like to know how do you handle non ecs tasks/containers regarding resource available, in particular the "standalone" docker container ecs-logs that is lunched when the vm start, do you trust the foot print will be low enough that you can dismiss it, like ecs-agent, or you impose some kind of constrains(--memory, --cpu..)?
AFAIU the ECS scheduler wont take those into account when scheduling work .

Typo in images/networking.png diagram

Private subnets on AZb and AZc should be:

10.30.64.0/19
10.30.128.0/19

[question] Stack version locking

Are there any plans to version the stack to lock in the functionality? Currently when creating an instance of the stack the default behavior is to pull the latest from master:

module "stack" {
  source             = "github.com/segmentio/stack"
}

Unless you always 100% guarantee everything is backwards compatible this creates problems when you have long-running instances and automation. If I don't check-in the .terraform/modules folder into my repo anytime I do terraform get I'm going to get updated modules.

Terraform has the ability to specify a branch/version with:

module "stack" {
  source             = "github.com/segmentio/stack?ref=1.0.0"
}

This would guaranteed we always got the version we currently had deployed and would not run into a situation where we suddenly got a diff that could cause issues.

I'm curious to hear as to how other people are solving this.

Bastion question

Hello,

I was able to setup stack with bastion instance and 2 private instances.
I was able to ssh into bastion ssh -i "my-pem.pem" ubuntu@bastion-ip.
However I'm having trouble ssh-ing into the private ec2 instances.
I did ssh -i "my-pem.pem" ubuntu@bastion-ip ssh private-ip however I'm getting Permission denied (publickey) error.
Also I'm not really sure how bastion really works. Where does it take "my-pem.pem" file to ssh into private instances?

missing provider "aws" in build instructions?

I don't have much experience with Terraform, so I'm probably doing this wrong, but I was running into issues running terraform plan. I have aws-cli configured and setup, along with a generated keypair.

When running terraform plan, I got the same question over and over:

$ stack : terraform plan
provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value:

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value:

provider.aws.region
  The region where AWS operations will take place. Examples
  are us-east-1, us-west-2, etc.

  Default: us-east-1
  Enter a value:

Eventually followed by:

Error refreshing state: 8 error(s) occurred:

* 1 error(s) occurred:

* InvalidClientTokenId: The security token included in the request is invalid.
    status code: 403, request id: 4e018f76-484f-11e6-acd8-238e6fba343f
* 1 error(s) occurred:

* InvalidClientTokenId: The security token included in the request is invalid.
    status code: 403, request id: 4e01b687-484f-11e6-acd8-238e6fba343f
* 1 error(s) occurred:

In order to get this working I added a provider to the top of my terraform.tf file:

provider "aws" {
  access_key = "<access-key>"
  secret_key = "<secret-key>"
  region     = "us-east-1"
}

module "stack" {
  source = "github.com/segmentio/stack"
  name = "my-project"
  environment = "prod"
  key_name = "<key-name>"
}

Is this good practice? Or is there a better way to handle this? May just want to clarify this part for the terraform newbs out there :-)

[Question] What if bastion will be VPN server?

Hi.
I need first to say this is awesome repo. Thank you for your work.

I have small question/suggestion. You propose to use your bastion host to ssh to other instances. By idea we can install any VPN server there and ssh directly to each instance.
WDYT?

[Help] ELB not working?

I am testing with a pretty simple implementation of stack that looks like this:

provider "aws" {
  region = "us-east-1"
}

module "stack" {
  source      = "./stack"
  name        = "staging-my-app"
  environment = "staging"
  key_name    = "stack"
  availability_zones = ["us-east-1a","us-east-1b","us-east-1c"]
  region = "us-east-1"
}

module "domain" {
  source = "./stack/dns"
  name = "nginx-test.com"
}

module "nginx_test" {
  source         = "./stack/web-service"
  name           = "nginx-test"
  image          = "nginx"
  port           = 80
  container_port = 80
  desired_count  = 3

  ssl_certificate_id = "~redacted~"

  environment       = "${module.stack.environment}"
  cluster           = "${module.stack.cluster}"
  iam_role          = "${module.stack.iam_role}"
  security_groups   = "${module.stack.internal_elb}"
  log_bucket        = "${module.stack.log_bucket_id}"
  internal_zone_id  = "${module.stack.zone_id}"
  external_zone_id  = "${module.domain.zone_id}"
  subnet_ids        = "${join(",", module.stack.external_subnets)}"
}

output "bastion_ip" {
  value = "${module.stack.bastion_ip}"
}

When I terraform apply, everything looks good. After a few minutes all of the tasks add themselves to the ELB and the health checks pass. I can ssh into the bastion server and curl an individual nginx instance directly:

$ curl ip-10-30-88-224.ec2.internal
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

...but if I try to reach the load balancer url that is shown in the aws console... nothing loads 😭 I'm at a loss for why it can't connect me to the containers even though the health checks can.
Any advice?

ClusterContainsContainerInstancesException

Im more than a bit excited about Stack. This is not nit picking, but sharing info because I experience quite a few issues. Using Terrafrom 0.6.16.

On a fresh and branch new stack I have the following plan:


module "autoreg_stack" {
  source      = "github.com/segmentio/stack"
  environment = "iac"
  key_name    = "autoreg"
  name        = "autoreg-stack"
  domain_name = "autoreg-stack.local"
  region      = "us-west-2"
  availability_zones  = "us-west-2a,us-west-2b,us-west-2c"
  cidr        = "192.168.0.0/16"
  internal_subnets  = "192.168.0.0/19,192.168.64.0/19,192.168.128.0/19"
  external_subnets  = "192.168.32.0/20,192.168.96.0/20,192.168.160.0/20"
  ecs_instance_type = "t2.small"
  ecs_instance_ebs_optimized  = false
}

Unfortunately, apply does not make it all the way through:

Error applying plan:

1 error(s) occurred:

* aws_ecs_cluster.main: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active.
        status code: 400, request id: 8bc05c80-5089-11e6-a6e1-7bf5f1238842

But if I plan and apply again, then Terrafrom says there are no changes to be made.

Bootstrapping aws

Hi,

Could you elaborate on the process of bootstrapping aws from scratch?

And it would be interesting to hear about operation part: how the stack evolving? Do you use atlas or home grown tools to keep track of states and apply changes?

ECS-Cluster Cloud-Init Sequence Yields Yaml Parsing Errors for Private Docker Registry Auth

The last ecs-cluster cloud-init sequence deals with private docker registry authentication:
- echo 'ECS_ENGINE_AUTH_DATA=${docker_auth_data}' >> /etc/ecs/ecs.config

The problem is that the auth is in the form of JSON which causes Yaml parsing issues due to the colons present.

Resource count can't reference module variable (Question)

// This will allow route traffic through the VPC peering
resource "aws_route" "internal" {
  count                     = "${length(module.stack.internal_subnets)}"
  route_table_id            = "${element(split(",", module.stack.internal_route_tables), count.index)}"
  destination_cidr_block    = "10.22.0.0/16"
  vpc_peering_connection_id = "pcx-5660d13f"
}

Error loading Terraform: module root: 1 error(s) occurred:

aws_route.internal: resource count can't reference module variable: module.stack.internal_subnets

Any idea what I'm doing wrong here?

DNS Entry for RDS Cluster

We get the ability to put a dns_name on a service cluster, should we be able to do the same for the rds cluster?

Something similar to what's referenced in this image (db.stack.local): https://segment.com/blog/the-segment-aws-stack/images/main.png

Sorry if this one doesn't make sense, I've never used RDS.

Using compact interpolation has issues with destroying resources

So I was testing out this build and came across a potential bug in terraform. In some of the interpolations, you guys are using compact to ensure your lists do not have any spaces. The creation of the resources is fine, but in some cases, I found it unable to delete resources. Specifically aws_eip.

I posted an issue in the terraform repo:

hashicorp/terraform#9117

Question: using `self` instead of defining CIDR on security group

Hiya, team!

Reading over your security group module and was curious: why define a CIDR for the internal security groups when you could set self = true? I believe they'll accomplish the same thing and it'd be one less variable to pass around!

So instead of:

resource "aws_security_group" "internal_elb" {
  // trimmed...
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["${var.cidr}"]
  }
}

Do:

resource "aws_security_group" "internal_elb" {
  // trimmed...
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    self        = true
  }
}

From Terraform's docs:

If true, the security group itself will be added as a source to this ingress|egress rule.

Question: Running one-off tasks

How do you run one-off tasks, such as running a script or a database migration against a container/task definition?

I currently have a very similar terraform setup but I run one-off tasks through the aws cli.

[question] ALB instead of ELB?

Are there any strong reasons not to use the ALB for the services instead of an ELB? If not I'm happy to submit a PR which changes this.

Document known-compatible Terraform versions(s)

It would be great to include a list of known-compatible / recommended versions of Terraform in the top-level README. I have come across a number of issues (myself included) that appear stem from folks using a newer version of Terraform than what Stack was developed against / supports.

Examples:

#59 - subnet_ids expect string but is list
#39 - Stack and Terraform v0.7.0
#57 - Resource count can't reference module variable (Question)

I see a comment from 7/27/2016 that indicates the stack team is using v0.6.15. Is this still valid? If not, what's the current recommended version of Terraform to use?

[Question] IAM Profile

What IAM privileges are necessary for this to work? Can you provide a specific list, preferably in the readme?

desired_capacity clashes with alarm-based scaling

This is in reference to the ecs-cluster module.

I may be wrong on this (please correct me if I am), but doesn't setting desired_capacity on your autoscaling group conflict with the alarm-based scaling? If your asg is scaling up because of high cpu/memory reservation and you run terraform apply at the same time, it will knock your instance count back down to the number specified in your code (alarm-based scaling modifies the group's desired capacity). See this thread for reference.

desired_capacity is an optional argument for aws_autoscaling_group, so maybe it would be better to take it out (or at least not set a default).

[Question] Base AMI bootstraping

In your bootstrap scripts i noticed you directly set all values. Have you tried to use cloud-init for this? (I am asking just because we use ubuntu 14.04 and cloud-init for the same). Maybe you already tried and it does not work with 16.04/systemd?

Thanks.

can not destroy due to cycle

Thanks for the awesome project. I have enjoyed checking it out.

My plan consists of:

module "autoreg_stack" {
  source      = "github.com/segmentio/stack"
  environment = "iac"
  key_name    = "autoreg"
  name        = "autoreg"
  domain_name = "autoreg.local"
  region      = "us-west-2"
  availability_zones  = "us-west-2a,us-west-2b,us-west-2c"
  cidr        = "192.168.0.0/16"
  internal_subnets  = "192.168.0.0/19,192.168.64.0/19,192.168.128.0/19"
  external_subnets  = "192.168.32.0/20,192.168.96.0/20,192.168.160.0/20"
  ecs_instance_type = "t2.large"
  ecs_instance_ebs_optimized  = false
}

And if try to destroy i get a cycle error that I am not finding:

destroy:
    $(TF)  plan -detailed-exitcode -destroy -out=destroy.tfplan
    $(TF)  apply destroy.tfplan

$ make destroy
There are warnings and/or errors related to your configuration. Please
fix these before continuing.

Errors:

  * 1 error(s) occurred:

* Cycle: module.autoreg_infra.module.ecs_cluster.aws_cloudwatch_metric_alarm.memory_high (destroy), module.autoreg_infra.module.ecs_cluster.aws_cloudwatch_metric_alarm.cpu_high (destroy), module.autoreg_infra.module.ecs_cluster.aws_autoscaling_policy.scale_up (destroy), module.autoreg_infra.module.ecs_cluster.aws_cloudwatch_metric_alarm.cpu_low (destroy), module.autoreg_infra.module.vpc.output.internal_subnets, module.autoreg_infra.module.ecs_cluster.var.subnet_ids, module.autoreg_infra.module.ecs_cluster.aws_cloudwatch_metric_alarm.memory_low (destroy), module.autoreg_infra.module.ecs_cluster.aws_autoscaling_policy.scale_down (destroy), module.autoreg_infra.module.vpc.output.availability_zones, module.autoreg_infra.module.ecs_cluster.var.availability_zones, module.autoreg_infra.output.external_elb, module.autoreg_infra.module.security_groups.aws_security_group.external_elb (destroy), module.autoreg_infra.module.security_groups.aws_security_group.internal_ssh (destroy), module.autoreg_infra.module.bastion.var.subnet_id, module.autoreg_infra.module.security_groups.output.external_ssh, module.autoreg_infra.module.bastion.var.security_groups, module.autoreg_infra.module.vpc.aws_subnet.external (destroy), module.autoreg_infra.module.vpc.aws_subnet.external, module.autoreg_infra.module.vpc.output.external_subnets, module.autoreg_infra.module.security_groups.aws_security_group.external_ssh (destroy), module.autoreg_infra.output.ecs_cluster_security_group_id, module.autoreg_infra.module.iam_role.aws_iam_role.default_ecs_role (destroy), module.autoreg_infra.module.iam_role.output.profile, module.autoreg_infra.module.ecs_cluster.var.security_groups, module.autoreg_infra.module.security_groups.output.external_elb, module.autoreg_infra.module.ecs_cluster.output.security_group_id, module.autoreg_infra.module.ecs_cluster.aws_security_group.cluster (destroy), module.autoreg_infra.module.security_groups.output.internal_ssh, module.autoreg_infra.module.security_groups.var.vpc_id, module.autoreg_infra.module.security_groups.output.internal_elb, module.autoreg_infra.output.internal_elb, module.autoreg_infra.module.vpc.output.id, module.autoreg_infra.module.ecs_cluster.var.vpc_id, module.autoreg_infra.module.security_groups.aws_security_group.internal_elb (destroy), module.autoreg_infra.module.vpc.aws_vpc.main (destroy), module.autoreg_infra.module.vpc.aws_subnet.internal (destroy), module.autoreg_infra.module.vpc.aws_subnet.internal, module.autoreg_infra.module.ecs_cluster.aws_autoscaling_group.main (destroy), module.autoreg_infra.module.ecs_cluster.aws_launch_configuration.main (destroy), module.autoreg_infra.module.iam_role.aws_iam_instance_profile.default_ecs (destroy), module.autoreg_infra.module.ecs_cluster.var.iam_instance_profile

There seems to be several terraform issues related to create_before_destroy = true but it seems Stack specific and I am not finding the solution.

Stack and Terraform v0.7.0

Hi,

I tried to launch the cluster with terrafrom v0.7.0 and found the following

Had to change s3-logs/main.tf

data "template_file" "policy" {
template = "${file("${path.module}/policy.json")}"

vars = {
bucket = "${var.name}-${var.environment}-logs"
account_id = "${var.account_id}"
}
}

resource "aws_s3_bucket" "logs" {
bucket = "${var.name}-${var.environment}-logs"

tags {
Name = "${var.name}-${var.environment}-logs"
Environment = "${var.environment}"
}

policy = "${data.template_file.policy.rendered}"
}

For error in template_file.cloud_config: using template_file as a resource is deprecated; consider using the data source instead In ecs-cluster/main.tf

I changed that but it does not work

I then reverted the changes in ecs-cluster/main.tf and did a terraform apply and it worked since it was a warning.

After i include

module "rds" {
source = "github.com/segmentio/stack/rds-cluster" # rds module source
name = "pingdummy"
database_name = "pingdummy"
master_username = "root"
master_password = "password"

these options are automatically generated by the stack

environment = "${module.stack.environment}"
vpc_id = "${module.stack.vpc_id}"
security_groups = "${module.stack.cluster_security_group_id}"
subnet_ids = "${module.stack.private_subnets}"
availability_zones = "${module.stack.availability_zones}"
}

I get the following error
Error configuring: 2 error(s) occurred:

module.rds: missing dependency: module.stack.output.private_subnets
module.rds: missing dependency: module.stack.output.cluster_security_group_id

What am i doing wrong?

segmentio / stack Goto Github PK

stack's Introduction

⚠️ Unmaintained ⚠️

Segment Stack

Quickstart

Requirements

Architecture

Networking

Instances

Services

Bastion

Logging

Module Reference

Developing

AMIs

Terraform

Examples

Authors

License

stack's People

Contributors

Stargazers

Watchers

Forkers

stack's Issues

these options are automatically generated by the stack

Recommend Projects

Recommend Topics

Recommend Org

Jobs