GithubHelp home page GithubHelp logo

dwp / aws-azkaban Goto Github PK

View Code? Open in Web Editor NEW
1.0 11.0 3.0 467 KB

An AWS based azkaban platform

License: ISC License

Makefile 0.98% Python 14.35% HCL 65.59% Shell 10.19% Smarty 1.41% Jinja 7.48%
hacktoberfest govuk aws infrastructure infrastructure-as-code terraform

aws-azkaban's Introduction

DO NOT USE THIS REPO - MIGRATED TO GITLAB

aws-azkaban

An AWS based azkaban platform

Description

AWS Azkaban deploys two containerised versions of Azkaban that both back onto an AWS EMR cluster along with the peripheral infrastructure required for functionality and security. One of the versions is for users through the Analytical Env and the other for use by admins and engineers (referred to as azkaban_external), that is accessible directly through a URL. The frontend of the service is handled by the webserver containers from which, tasks are sent to and then handled by the executors. An Aurora Serverless database is used to track active executors that can be called by the webservers when needed.

Development

This repo contains only the IAC and lambdas and these can be developed as they are found. The Azkaban containers themselves can be found here along with further documentation on them. The containers are pushed to ECR and called by name by the infrastructure in this repo.

The deployment is handled using a Concourse job and the pipeline code can be found in the /ci directory. There are admin ci jobs for cycling the containers and rotating passwords.

Lambda's

There are 3 lambdas in this repo that carry out administrative tasks:

1. azkaban-truncate-table: Used to truncate the active executors table to ensure no inactive executors are called upon redeployment of the service.

2. azkaban-zip-uploader: Used to upload .zip files containing Azkaban projects from AWS S3. The lambda is triggered by a *.success file being uploaded to a dir in the given S3 and is used to safely access the Azkaban API from within the VPC.

3. manage-azkaban-mysql-user Used to rotate the credentials used to access the Aurora Serverless DB that is mentioned above.

Access

Production Azkaban can be found here.

Other envs: Dev QA INT PreProd

Authentication

Currently two instances of Azkaban are deployed. user and external. They use differing authentication methods. External uses cognito. In order to accommodate programmatic access to Azkaban external, a cognito user was created. The details are stored in the ../azkaban_external/cognito secret. In order to accommodate programmatic access to Azkaban user, a traditional user is created. The details are stored in the ../workflow_manager secret.

Access to user azkaban via the Analytical Env is managed by rbac 2 roles in the batch security config on the batch cluster and jobs are run as the user that triggered them. Access to external azkaban is managed by the concourse cognito userpool and jobs run in external azkaban are run as a service user. These settings are toggled in the config/*.properties files and controlled by the extensions mentioned below.

Requirements

  • Terraform 1.0.11
  • Python 3
  • JQ
  • Access to Dataworks AWS Environments

Bootstrapping

Before beginning you will need to generate some Terraform files from templates, to do this you will need to simply run the following:

make bootstrap

You will then be able to develop against the development account (default Terraform workspace)

Azkaban Extensions

CognitoUserManager - An extension to the XML UserManager that can also receive a Cognito JSON Web Token. The user manager decodes and validates the token and from this information is able to authenticate the user.

CognitoUserManagerProxy - An extension which validates a users name and password against an existing Cognito User Pool using USER_SRP_AUTH type authentication. The pool details are stored in the webservers azkaban.properties

EMR JobType - A job type that extends the process job type and can receive the script and arguments that need to be submitted to the cluster. Ensures the correct group that it needs to be run as is submitted along with the script.

Monitoring

Currently, only external azkaban is being monitored in dev, preprod and prod environments. This is configured as aws_cloudwatch_metric_alarms in the external_monitoring.tf file. The active alerts are for mismatches in desired task counts and running task counts for the webservers and executors and 500 errors in the frontend. When triggered these alerts are sent to the dataworks-aws-service-alerts slack channel.

High level infrastructure outline

AWS Azkaban Infrastructure

Monitoring Canary

There exists a monitoring project on Azkaban which runs every 10 minutes - called 'monitoring'. This task acts as a canary. If the canary succeeds, it will print 'Hello World' into the executor log files. Cloudwatch is monitoring for this log and when it finds it, it will record '1' against the 'azkaban-external-monitoring-canary-success' metric.

An alarm exists which will check that the 'azkaban-external-monitoring-canary-success' metric has a datapoint of value 1 or above every 15 minutes; if it doesn't it will raise an alert.

Schedule of Monitoring Canary

In the event that schedules are lost on Azkaban, you must set the schedule of the monitoring canary manually as it requires the {user.to.proxy} value to be that of a valid Azkaban user. (Find this user in the Azkaban executor logs) This parameter is not yet supported in the azkaban-job-scheduler lambda.

aws-azkaban's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.