degica / barcelona Goto Github PK

PaaS built on top of AWS

License: MIT License

Ruby 94.75% CSS 0.12% HTML 2.63% Makefile 0.20% Shell 0.02% Go 1.15% JavaScript 0.33% Python 0.52% Dockerfile 0.29% Procfile 0.01%

aws barcelona docker ecs paas rails

barcelona's Introduction

⚔️ Degica Quest ⚔️

Welcome brave Ruby warrior. An epic adventure awaits you.

🛠 How to Play

Install the rubygem

gem install degica

And then execute:

$ degica

💪 Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/degica/degica. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

©️ License

MIT

barcelona's People

Contributors

Stargazers

Watchers

Forkers

jacko972 agahchen project-kotinos thoiberg sangqt iq-scm

barcelona's Issues

Track deployment history

Right now I have no idea how to do it but it would be very useful if Barcelona store deployment histories.
Having deploy history we can do like bcn deploy rollback

Support Kinesis Stream as log destination

This is our internal request. Currently Barcelona supports only Logentries as the log destination but that is not flexible enough to support our log requirements.
We need to send some logs to Logentries and some others to Elastic Search.
it doesn't make sense for Barcelona to support many log drivers, instead I want to support Kinesis Stream and barcelona users can pull logs from Kinesis Stream. This way users can use logs stored in Kinesis Stream by implementing "puller" component (usually I guess Lambda)

Find a way to deploy Barcelona to ECS cluster

the current barcelona API is deployed in heroku https://barcelona-demo.herokuapp.com but for production, Barcelona have to be deployed to a trusted infrastructure: ECS + secure VPC.

Because Barcelona cannot be used to deploy Barcelona, I have to write CloudFormation template which describes ECS cluster, service, and task definition dedicated for barcelona API service.

Use cloud-config in EC2 userdata

Currently Barcelona is using a bash format for EC2 userdata. this should be written in cloud-config format because it's easy to maintain long running services

can't input multi byte characters on run shell

When I run bcn run I couldn't input multi byte characters like こんにちは

HTTP Proxy

Looks like ECS agent started supporting HTTP proxy aws/amazon-ecs-agent#211
It's still experimental feature and once it gets stable, I'd like to setup HTTP proxy(squid) in Barcelona clusters so that I can remove NAT instances

Support awslogs log driver

ECS finally added AWS cloudwatch logs support

I think for most users awslogs is the primary choice of log destination so we should support it

proxy env vars are not set when `bcn run`

proxy plugin injects env vars when barcelona builds ECS task definition and it's not included in env_vars table

Expire login token

to gain higher security. For now I'll start with fixed "1 day" expiration

Add swap file

Sometimes a container instance dies. I don't know why but I guess that is because the memory usage was full.

I thought ECS instances could work without swap because memory limitation of each container is strictly defined and there is additional reserved memory 128MB which is not used by ECS.

but in general it is a good practice to use swap

Scheduled tasks(cron)

This is the last essential feature that Barcelona currently doesn't have. Even without this feature, you can run a cron container for scheduled jobs but it's not resource-efficient and also it requires every application to be implemented cron tasks in their own way.

What I'd like to implement is the following heritage configuration

{
  "name": "komoju-production",
  "services": [
    // ...
  ],
  // ...
  "scheduled_tasks": [
    {
      "schedule": "*/15 * * * *",
      "command": "rake cron:expire_old_payments"
    }
  ],
  // ...
}

AWS WAF

Fine-grained permission control per team

The current permission control is so simple that we cannot control user's accessibility to districts and service. Per-user permission is I think overkill so I propose per-team permission as follows.

`Team` resource

I propose to add Team resource(model) which holds the following permissions:

Name	Default	Description
districts	`[]`	a list of districts which team members can access to
role	`member`	One of `admin`, `developer`, or `readonly`
github_team		A name of GitHub's team which Barcelona team is linked to
github_organization		A name of GitHub's organization which Barcelona team is linked to

`User`

Each user can belong to multiple teams
User should first provide GitHub token
- Barcelona get github information about the user and decides which teams the user will belong to

Roles

Each Team has one of the following roles.

`admin` role

admin role can do anything to districts the team belong to

`developer` role

developer role can do anything admin can do except following actions:

updating / deleting / creating a district
launch / terminate a container instance
deleting a heritage
updating / creating / deleting a team
updating / deleting a user (except himself)

`readonly` role

Maybe we don't need this role?

readonly role can do only the following actions:

show a district
show a heritage
show a team
show a user

@essa what do you think about this? Is this too complicated or not so flexible?

Integrate with Spot Fleet

This is not a required feature but it's very interesting if Barcelona can use AWS's Spot Fleet.

Add automated deploy trigger

This is not required feature but it would be very useful if Barcelona has deploy trigger API. The workflow would be as follows

new commits are pushed to GitHub master branch
GitHub sends webhook to quay.io(docker registry)
quay.io starts building docker image with the newest repository
when finished, quay.io sends webhook to barcelona This is the trigger API
barcelona triggers ECS's update_service API for staging environment

users are not added to a newly created district

Save (encrypted) AWS keys in DB

(If infra migrations successfully completed) We will have at least 3 districts:

Komoju(AWS komoju account)
Degica production (degica2 account)
Degica staging(degica3 account)

those districts will belong to different aws accounts and therefore Barcelona needs to store AWS credentials in DB per district.

Web frontend

somebody want to try?

m4.large instance failed to launch

When I tried to change cluster instance type to m4.large the init script failed with this error message: error reading information on service barcelona: Invalid argument

I had never seen this error until today So I think the current configuration cannot work with m4.large instance (and probably other instance types. I've used t2.* instance types)

Integrate with AutoScaling

The current algorithm of picking a subnet for a new instance is:

subnet_id: section.subnets.sample.subnet_id

Which is really bad for high availability :(

Create degica organization on quay.io

we have to pay 💵

Fluentd integration

the current rsyslog and logentries logger is not flexible enough

Replace system tasks with EC2 Run Command API

Now that AWS provides run command API for tokyo region we can use it to manage systems.
Currently that kind of system tasks is done by ECS's RunTask API but it is kind of a magic approach so I want to replace it with more straightforward shell scripting

Provide CloudFormation template for a district

Related #125

The reason why I haven't provide a district template is that users may want to setup VPC differently. A user may need

VPC with public subnets only
VPC with public/subnets and NAT instances
- they may want to setup high availability NAT instances
VPC with public/subnets and outbound proxy services

So desired configuration is totally different per user depending on users needs

But with NAT gateway there is no reason not to choose public/private subnets with the managed NAT gateway. Barcelona, as an opinionated private PaaS powered by ECS, should provide a pre-configured VPC template

Implement user roles and permissions

we need two user roles mostly because of PCI requirements

admin
- can access all districts
- can sudo in container instances
- can register non-admin user's public key
non-admin
- cannot sudo
- can access only districts permitted by admin users

to make things simple and easy, I will just use GitHub teams

GitHub admin developers team is Barcelona's admin users
GitHub developers team is Barcelona's non-admi users

FInd a way to safely deregister and terminate container instances

Currently ECS agent doesn't have a way to safely terminate ECS container instances which run long-running service or background jobs. see aws/amazon-ecs-agent#130 and aws/amazon-ecs-agent#126

What I want to do is

Add "terminate container instances" endpoint to Barcelona
When the endpoint is called, Barcelona call StartTask API to run a utility container that deregister instance, stop running containers and finally terminate the instance.

limit access to environment variables

Currently any users can get and post environment variables which could be security exploit. Our first step would be introducing HeritagePolicy which defines who can access to environment variables of a particular heritage.

Confirm (and change) logentries TLS configuration

At the time when I setup Logentries plugin, (if I remember correctly) I followed the official documentation and setup to use api.logentries.com:2000 for TLS-encrypted TCP connection but now the document changed https://logentries.com/doc/rsyslog/ and it says data.logentries.com:443 should be used.

I don't know what configuration is correct so I sent an email to logentries. If they say data.logentries.com:443 should be used (which means api.logentries.com:20000 is legacy?) I'll update logentries plugin configuration

API Schema

JSON hyper schema? swagger? Maybe it's better to use JSON hyper schema

Enable proxy protocol for ELB by default

some apps(including basecamp2) will use ELB as TCP load balancer which means the app cannot get an original IP via X-Forwared-For. Barcelona's default ELB setting is TCP LB so it's reasonable to enable proxy protocol by default.

Ref: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/enable-proxy-protocol.html

Don't retry delayed job if it fails

Retrying failed jobs is meaningless. at least for now, if job fails something goes wrong in barcelona code and it's not recoverable by retrying. it's enough to notify error to slack

Integrate with VPC NAT gateway

I'll work on this once CloudFormation supports NAT gateway. With NAT gateway we don't need proxy plugin and public sections

create .docker/config

Since #59 was merged, ~/.docker/config has not been created in a cloud-init execution.

Make sure that all degica delayed jobs finish in 30 seconds

when ECS's stop_task is executed, docker first sends SIGTERM signal to a container and 30 seconds later if container still is not stopped, docker then sends SIGKILL to the container, the same signal kill -9 sends. That 30 seconds cannot be customized

For example when I deploy a new version of an application, ECS first spin up a new container with the new image, and when the new container reaches a steady state, ECS tries to stop old containers. If delayed_jobs running inside the old container was processing long-running job, the job would be forcedly killed after 30 seconds

HTTP health check

Currently barcelona only supports TCP health check

Can't patch user when container instances are zero

I failed to update public_key of '/user'

I tested it with Rails console. And find it raises Aws::ECS::Errors::InvalidParameterException: Container Instances cannot be empty.

User tries to run UpdateUserTask for all District when public_key is updated. But some of District may have no container instances. Such DistrictSection should be skipped.

https://github.com/degica/barcelona/blob/master/app/models/user.rb#L92

I think SystemTask#run should skip executing task if container_instance_arns is empty.

https://github.com/degica/barcelona/blob/master/app/services/system_task.rb#L13

@k2nr

I will make a PR for it, but I'd like to confirm if you prefer checking it at early phase like DistrictSection#update_instance_user_account or not.

https://github.com/degica/barcelona/blob/master/app/models/district_section.rb#L88

Support before_deploy scripts

To make it possible to run migration before starting deploy, before_deploy script should be supported in heritage JSON.

I'm planning to support the following JSON format:

{
    "name": "my-application",
    "container_name": "quay.io/my-application",
    "container_tag": "v100",
    "before_deploy": [  // Adding this
        "bundle exec rake db:migrate"
    ],
    "services": [
        {
            "name": "web",
            "cpu": 1024,
            "memory": 512,
            "public": false,
            "port_mappings": [
                {"lb_port": 80, "container_port": 3000}
            ]
        }
    ]
}

Rename "heritage" to "app"

[PCI DSS plugin] fail2ban

for 8.1.6 and 8.1.7

Slack notifications

deploy lock

If 2 or more deployments are triggered simultaneously something bad could happen. it's hard to guess what would happen but maybe Barcelona should lock deployments

Speed up specs

Why is it so slow? currently running all specs (107 examples) takes 76 seconds

Heritage access key

In most cases for automation, we don't need access keys which can access everything. for example travis deployments only needs access for a heritage

Support ECR

With ECR, ecs-agent get permissions to pull via ECR API which means that a user doesn't need to setup dockercfg, which makes initial setup easier.

I think there are 2 issues:

unlike ecs-agent bcn run depends on the standard docker pull so bcn run doesn't work without addidtional dockercfg setting.
Not related to Barcelona itself but we heavily depend on quay automated build in our deployment pipeline and ECR doesn't have automated image build solution

Add Slack slash commands support

should be useful for all slack + barcelona users

terminate_instance API doesn't terminate an EC2 instance

When terminate_instance API is called, barcelona do safe-termination by running k2nr/ecs-instance-terminator on the target instance.

ecs-instance-terminator do the following procedure sequentially:

Deregister a container instance from ECS cluster
Stop all docker containers
sleep $STOP_TIMEOUT (120 by default)
Terminate instance

only the final step doesn't work as expected

degica / barcelona Goto Github PK

barcelona's Introduction

⚔️ Degica Quest ⚔️

🛠 How to Play

💪 Contributing

©️ License

barcelona's People

Contributors

Stargazers

Watchers

Forkers

barcelona's Issues

Team resource

User

Roles

admin role

developer role

readonly role

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`Team` resource

`User`

`admin` role

`developer` role

`readonly` role