A CLI that wraps calls to the aws cli
for s3 synchronization and pg_dump/psql
for RDS/postgres database "migration". Both data stores are unidirectionally copied from template deploy environments (source) to live-1 (destination).
To build the docker image using the provided script you will need credentials for accessing live-1 cccd ECR registry. The script assumes you have valid aws credentials stored in a profile named ecr-live1
.
$ cd .../cccd-migrator
$ .k8s/build.sh
The docker image is intended to be deployed as a container in a standalone pod within the namespace hosting the destination s3 bucket and RDS instance. e.g. cccd-dev
# deploy master branch cccd-template-deploy-migrator to the cccd-dev namespace
$ cd .../cccd-migrator
$ .k8s/deploy.sh dev
# deploy a branch of cccd-template-deploy-migrator to the cccd-dev namespace
$ cd .../cccd-migrator
$ .k8s/deploy.sh dev <my-branch>-latest|<commit-sha>
There is a cronjob that can be applied to schedule unattended s3 sync:
# apply the cronjob for syncing s3 between cccd-dev namespace's s3 bucket and TD dev's s3 bucket
$ cd .../cccd-migrator
$ .k8s/sync_s3_cronjob.sh dev
It is intended that the migration task be run once via the pod and, thereafter, as a cronjob. The cronjob synchronizes the live-1 s3 bucket with template-deploy every hour, on the hour, in the namespace it is applied to.
The cronjob(s) will need to be deleted once a final sync is done on "d-day"
kubectl --context ${context} -n cccd-${environment} delete cronjob sync_s3_cronjob
Note that the wrapped aws s3 sync
command includes the --delete
option. This will delete objects in destination that do not exist in source.
Output cli help:
bin/migrate -h
Produce a summary report of source and destination objects:
bin/migrate s3 --report -ym
Synchronize destination with source:
bin/migrate s3 --sync -ym
Delete all objects in destination bucket, for testing purposes only:
bin/migrate s3 --empty -ym
note: --sync
option deletes objects in destination that are not in source. So --empty
is purely for testing purposes
The "migration" of a single postgres database can be achieved using this utility.
The CLI will:
- terminate existing connections on the
destination
database - drop the existing
destination
database - create an empty
destination
database with the same name and owner - produce plain text dump files from the
source
database - apply those dump files to the empty
destination
database
Recreate destination database using source database:
$ bin/migrate rds --sync -ym
In order for the CLI's s3 synchronization to work several setup steps are required.
The IAM user of the s3 destination bucket must have a policy that includes the actions and resources necessary for listing and object actions fors its own bucket AND that of the source
# example user policy for terraform file - s3.tf
user_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": [
"$${bucket_arn}",
"arn:aws:s3:::example-source-bucket-name"
]
},
{
"Sid": "",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"$${bucket_arn}/*",
"arn:aws:s3:::example-source-bucket-name/*"
]
}
]
}
EOF
The source (template-deploy) s3 bucket must have a bucket policy that allows the destination s3 user to list bucket and read/copy objects.
To do this you must first retrieve the destination s3 users ARN for use in the source bucket policy
# retrieve destination s3 user ARN
$ unset AWS_PROFILE ; read K a n S <<<$(kubectl -n my-namespace get secret my-s3-secrets -o json | jq -r '.data[] | @base64d') ; export AWS_ACCESS_KEY_ID=$K ; export AWS_SECRET_ACCESS_KEY=$S ; aws sts get-caller-identity
You should get output similar to below:
{
"UserId": "<ALPHANUMERIC>",
"Account": "<LONG-INTEGER>",
"Arn": "arn:aws:iam::<LONG_INTEGER>:user/system/s3-bucket-user/<team>/<random-s3-bucket-username>"
}
You can then create a bucket policy on the source bucket you are wanting to sync (copy) data from.
Example source bucket policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCccdSourceBucketAccess",
"Effect": "Allow",
"Principal": {
arn:aws:iam::<accountid>:user/system/s3-bucket-user/<team>/s3-bucket-user-random
"AWS": "arn:aws:iam::<user-id-from-sts-output>"
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::example-source-bucket",
"arn:aws:s3:::example-source-bucket/*"
]
}
]
}
Note: these settings limit the destination IAM users actions and resource access to list, get and copy type actions only on the source bucket.
You must supply the following environment variables for s3 commands to function:
- source region:
SOURCE_AWS_REGION
- source bucket name:
SOURCE_AWS_S3_BUCKET_NAME
- destination region:
AWS_REGION
- destination bucket name:
DESTINATION_AWS_S3_BUCKET_NAME
In order for the CLI's rds "synchronization" to work several setup steps are required.
The source database will need to provide permissions to the live-1 cluster to enable pg_dump
to read the data. To summarise, this requires the source db instance to be public but its VPC security group to whitelist inbound traffic from the live-1 cluster. This can be achieved as follows:
- Login to the aws account that owns the rds instance
- navigate to RDS service
- click the "DB instances" link
- search/identify the db instance that is to be the source
- click on the db instance i.e. navigate to that instances page
- on the instance page locate the "VPC security groups" section
- click on the link to the VPC security group
- In Security group page select the "Inbound" tab
- Hit edit for Inbound rules
- In the Edit inbound rules dialog add 3 rules
- Each "Inbound rule" should be
- Type: PostgreSQL
- Protocol: TCP
- Port range: 5432
- Source: <ip-range/cidr for live-1 cluster, ask Cloud platorms>
- Description: live-1 cluster whitelist
- Hit save
If the database is not public (typical default) it will need to be made so.
- check its public status
- Navigate to the specific RDS DB instance page
- Look for "Public accessibility" within "Connectivity & security" section
- Make public (if required)
- Navigate to the specific RDS DB instance page
- Hit Modify button
- On the "Modify DB Instance" page select yes radio option for "Public accessibility"
- Hit continue on the bottom of the page
- select "Apply immediately" radio option
- WARNING: note that this will cause downtime
- If happy with downtime hit "Modify DB instance" button
You must supply the following environment variables for rds commands to function:
- source db url:
SOURCE_DATABASE_URL
- source db name:
SOURCE_DATABASE_NAME
- destination db url:
DESTINATION_DATABASE_URL
- destination db username:
DESTINATION_DATABASE_USERNAME
- destination db password:
DESTINATION_DATABASE_PASSWORD
- destination db host:
DESTINATION_DATABASE_HOST
- destination db name:
DESTINATION_DATABASE_NAME