GithubHelp home page GithubHelp logo

ons-innovation / code-repo-archive-tool Goto Github PK

View Code? Open in Web Editor NEW
2.0 0.0 0.0 1.3 MB

A flask app to archive and manage old repositories

Python 27.63% CSS 0.13% JavaScript 4.19% HTML 56.20% Dockerfile 0.33% HCL 11.53%

code-repo-archive-tool's Introduction

GitHub Repository Archive Tool

A Python Flask application to archive outdated organisation repositories.

Prerequisites

This project uses poetry for package management, colima a license free tool for containerisation, the AWS cli commands for interacting with cloud services and Terraform for deploying changes.

It is expected you have these tools installed before progressing further.

Instructions to install Poetry

Instructions to install Colima

Instructions to install AWS cli tool

Terraform to configure AWS

See the section on deployment for specific requirements and prerequisites to deploy to AWS.

Setup - Run outside of Docker

Prior to running outside of Docker ensure you have the necessary environment variables setup locally where you are running the application. E.g in linux or OSX you can run the following, providing appropriate values for the variables:

export AWS_ACCESS_KEY_ID=MYACCESSKEYID
export AWS_SECRET_ACCESS_KEY=MYSECRETACCESSKEY
export AWS_DEFAULT_REGION=eu-west-2
export AWS_SECRET_NAME=<aws_secret_name>
export GITHUB_ORG=ONS-Innovation
export GITHUB_APP_CLIENT_ID=<github_app_client_id>
export AWS_ACCOUNT_NAME=sdp-sandbox
  1. Navigate into the project's folder and create a virtual environment

    python3 -m venv <environment_name>
  2. Activate the environment

    source <environment_name>/bin/activate
  3. Install the required dependencies

    poetry install
  4. Get the repo-archive-github.pem file and copy to the source code root directory (see "Getting a .pem file" below).

  5. When running the project locally, you need to edit get_s3_client() within storage_interface.py.

    When creating an instance of boto3.session(), you must pass which AWS credential profile to use, as found in ~/.aws/credentials.

    When running locally:

    session = boto3.Session(profile_name="<profile_name>")
    s3 = session.client("s3")

    When running from a container:

    session = boto3.Session()
    s3 = session.client("s3")
  6. Run the project

    poetry run python3 repoarchivetool/app.py

Building a docker image

Build and tag the image

docker build -t repo-archive-tool .

Check the image is available locally

docker images

Example output:

REPOSITORY                                                      TAG       IMAGE ID       CREATED          SIZE
repo-archive-tool                                               latest    d9c802cef7eb   11 seconds ago   332MB

Run the image locally mapping local host port (5000) to container port (5000) and passing in AWS credentials to download a .pem file from the AWS Secrets Manager to the running container.

The credentials used in the below command are for a user in AWS that has permissions to retrieve secrets from AWS Secrets Manager.

docker run -p 5000:5000 \                
-e AWS_ACCESS_KEY_ID=<aws_access_key_id> \
-e AWS_SECRET_ACCESS_KEY=<aws_secret_access_key_id> \
-e AWS_DEFAULT_REGION=eu-west-2 \
-e AWS_SECRET_NAME=<aws_secret_name> \
-e GITHUB_ORG=ONS-Innovation \
-e GITHUB_APP_CLIENT_ID=<github_app_client_id> \
-e AWS_ACCOUNT_NAME=sdp-sandbox
repo-archive-tool

To check the container is running

docker ps 

Example output

CONTAINER ID   IMAGE               COMMAND                  CREATED          STATUS          PORTS                                       NAMES
e85a3ce5fecf   repo-archive-tool   "/app/start_repo_too…"   27 seconds ago   Up 25 seconds   0.0.0.0:5000->5000/tcp, :::5000->5000/tcp   cranky_yalow

To view the running application in a browser navigate to

Running on http://127.0.0.1:5000

To stop the running container either use the container ID:

docker stop e85a3ce5fecf

or the container name

docker stop cranky_yalow

Storing the container on AWS Elastic Container Registry (ECR)

When you make changes to the application a new container image must be pushed to ECR.

These instructions assume:

  1. You have a repository set up in your AWS account named sdp-repo-archive.
  2. You have created an AWS IAM user with permissions to read/write to ECR (e.g AmazonEC2ContainerRegistryFullAccess policy) and that you have created the necessary access keys for this user. The credentials for this user are stored in ~/.aws/credentials and can be used by accessing --profile <aws-credentials-profile>, if these are the only credentials in your file then the profile name is default

You can find the AWS repo push commands under your repository in ECR by selecting the "View Push Commands" button. This will display a guide to the following (replace <aws-credentials-profile>, <aws-account-id> and <version> accordingly):

  1. Get an authentication token and authenticate your docker client for pushing images to ECR:

    aws ecr --profile <aws-credentials-profile> get-login-password --region eu-west-2 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.eu-west-2.amazonaws.com
  2. Tag your latest built docker image for ECR (assumes you have run docker build -t sdp-repo-archive . locally first)

    docker tag sdp-repo-archive:latest <aws-account-id>.dkr.ecr.eu-west-2.amazonaws.com/sdp-repo-archive:<version>

    Note: To find the <version> to build look at the latest tagged version in ECR and increment appropriately

  3. Push the version up to ECR

    docker push <aws-account-id>.dkr.ecr.eu-west-2.amazonaws.com/sdp-repo-archive:<version>

Deployment to AWS

The deployment of the service is defined in Infrastructure as Code (IaC) using Terraform. The service is deployed as a container on an AWS Fargate Service Cluster.

Deployment Prerequisites

When first deploying the service to AWS the following prerequisites are expected to be in place or added.

Underlying AWS Infrastructure

The Terraform in this repository expects that underlying AWS infrastructure is present in AWS to deploy on top of, i.e:

  • Route53 DNS Records
  • Web Application Firewall and appropriate Rules and Rule Groups
  • Virtual Private Cloud with Private and Public Subnets
  • Security Groups
  • Application Load Balancer
  • ECS Service Cluster

That infrastructure is defined in the repository sdp-infrastructure

Bootstrap IAM User Groups, Users and an ECSTaskExecutionRole

The following users must be provisioned in AWS IAM:

  • ecr-user
    • Used for interaction with the Elastic Container Registry from AWS cli
  • ecs-app-user
    • Used for terraform staging of the resources required to deploy the service

The following groups and permissions must be defined and applied to the above users:

  • ecr-user-group
    • EC2 Container Registry Access
  • ecs-application-user-group
    • Cognito Power User
    • Dynamo DB Access
    • EC2 Access
    • ECS Access
    • ECS Task Execution Role Policy
    • Route53 Access
    • S3 Access
    • Cloudwatch Logs All Access (Custom Policy)
    • IAM Access
    • Secrets Manager Access

Further to the above an IAM Role must be defined to allow ECS tasks to be executed:

Bootstrap for Terraform

To store the state and implement a state locking mechanism for the service resources a Terraform backend is deployed in AWS (an S3 object and DynamoDbTable). Details can be found in the infrastructure repository above.

Bootstrap for Secrets Manager

The Github Audit and Repo service requires access to an associated Github App secret, this secret is created when the Github App is installed in the appropriate Github Organisation. The contents of the generated pem file is stored in the AWS Secret Manager and retrieved by this service to interact with Github securely.

AWS Secret Manager must be set up with a secret:

  • /sdp/tools/repoarchive/repo-archive-github.pem
    • A plaintext secret, containing the contents of the .pem file created when a Github App was installed.

Running the Terraform

There are associated README files in each of the Terraform modules in this repository. When first staging the service Terraform must be run in the following order:

  • terraform/storage/main.tf
    • This provisions the persistent storage used by the service.
  • terraform/authentication/main.tf
    • This provisions the Cognito authentication used by the service.
  • terraform/service/main.tf
    • This provisions the resources required to launch the service.

The reasoning behind splitting the terraform into separate areas is to allow a more flexible update of the application without the need to re-stage authentication or persistent storage.

Depending upon which environment you are deploying to you will want to run your terraform by pointing at an appropriate environment tfvars file.

Example service tfvars file: service/env/sandbox/example_tfvars.txt

Example authentication tfvars file: authentication/env/sandbox/example_tfvars.txt

Example storage tfvars file: storage/env/sandbox/example_tfvars.txt

Provision Users

When the service is first deployed an admin user must be created in the Cognito User Pool that was created when the authentication terraform was applied.

New users are manually provisioned in the AWS Console:

  • Navigate to Cognito->User Pools and select the pool created for the service
  • Under the Users section select Create User and choose the following:
    • Send an email invitation
    • Enter the ONS email address for the user to be added
    • Select Mark email address as verified
    • Under Temporary password choose:
      • Generate a password
    • Select Create User

An email invite will be sent to the selected email address along with a one-time password which is valid for 10 days.

Updating the running service using Terraform

If the application has been modified and the changes do not require the Cognito authentication or S3 store to be removed and re-staged (i.e most application level changes) then the following can be performed:

  • Build a new version of the container image and upload to ECR as per the instructions earlier in this guide.

  • Change directory to the service terraform

    cd terraform/service
  • In the appropriate environment variable file env/sandbox/sandbox.tfvars, env/dev/dev.tfvars or env/prod/prod.tfvars

    • Change the container_ver variable to the new version of your container.
    • Change the force_deployment variable to true.
  • Initialise terraform for the appropriate environment config file backend-dev.tfbackend or backend-prod.tfbackend run:

    terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure

    The reconfigure options ensures that the backend state is reconfigured to point to the appropriate S3 bucket.

    Please Note: This step requires an AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be loaded into the environment if not already in place. This can be done using:

    export AWS_ACCESS_KEY_ID="<aws_access_key_id>"
    export AWS_SECRET_ACCESS_KEY="<aws_secret_access_key>"
  • Refresh the local state to ensure it is in sync with the backend

    terraform refresh -var-file=env/dev/dev.tfvars
  • Plan the changes, ensuring you use the correct environment config (depending upon which env you are configuring):

    E.g. for the dev environment run

    terraform plan -var-file=env/dev/dev.tfvars
  • Apply the changes, ensuring you use the correct environment config (depending upon which env you are configuring):

    E.g. for the dev environment run

    terraform apply -var-file=env/dev/dev.tfvars
  • When the terraform has applied successfully the running task will have been replaced by a task running the container version you specified in the tfvars file

Destroying Service Resources

The resources for the service are applied using separate terraform for the main service, storage and authentication.

Destroy Only the Main Service Resources

The separation of the terraform enables the main service to be destroyed independent of the storage and authentication. This allows any data to persist and means the user list for the application does not have to be reconstructed.

  • Delete the service resources by running the following ensuring your reference the correct environment files for the backend-config and var files:

    cd terraform/service
    
    terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
    
    terraform refresh -var-file=env/dev/dev.tfvars
    
    terraform destroy -var-file=env/dev/dev.tfvars

Destroy All of the Service Resources

To destroy all resources the destroy must happen in the following order, storage, service and finally authentication :

  • Ensure that all data can be deleted in S3 and if so, manually delete any objects and the versioned objects in S3.

  • Update the lifecycle rule in the storage.tf to prevent_destroy=false. This is a temporary change whilst resources are destroyed and must be reverted once complete.

  • Delete the storage resources by running the following ensuring your reference the correct environment files for the backend-config and var files:

    cd terraform/storage
    
    terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
    
    terraform refresh -var-file=env/dev/dev.tfvars
    
    terraform destroy -var-file=env/dev/dev.tfvars
  • Delete the service resources by running the following ensuring your reference the correct environment files for the backend-config and var files:

    cd terraform/service
    
    terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
    
    terraform refresh -var-file=env/dev/dev.tfvars
    
    terraform destroy -var-file=env/dev/dev.tfvars
  • Delete the authentication resources by running the following ensuring your reference the correct environment files for the backend-config and var files:

    cd terraform/authentication
    
    terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
    
    terraform refresh -var-file=env/dev/dev.tfvars
    
    terraform destroy -var-file=env/dev/dev.tfvars

code-repo-archive-tool's People

Contributors

totaldwarf03 avatar gibbardsteve avatar dependabot[bot] avatar

Stargazers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.