GithubHelp home page GithubHelp logo

aws-samples / provision-codepipeline-glue-workflows Goto Github PK

View Code? Open in Web Editor NEW
17.0 5.0 18.0 1.45 MB

Git repo to accompany the AWS DevOps Blog: Using AWS DevOps Tools to model and provision AWS Glue workflows

Home Page: https://aws.amazon.com/blogs/devops/provision-codepipeline-glue-workflows

License: MIT No Attribution

Python 100.00%
aws codepipeline codebuild cloudformation glue-etl glue-workflow

provision-codepipeline-glue-workflows's Introduction

Using AWS DevOps Tools to model and provision AWS Glue workflows

Using AWS DevOps Tools to model and provision AWS Glue workflows.

This post provides a step-by-step guide on how to model and provision AWS Glue workflows utilizing a DevOps principle known as infrastructure as code (IaC) that emphasizes the use of templates, source control, and automation. The cloud resources in this solution are defined within AWS CloudFormation templates and provisioned with automation features provided by AWS CodePipeline and AWS CodeBuild. These AWS DevOps tools are flexible, interchangeable, and well suited for automating the deployment of AWS Glue workflows into different environments such as dev, test, and production, which typically reside in separate AWS accounts and Regions.

AWS Glue workflows allow you to manage dependencies between multiple components that interoperate within an end-to-end ETL data pipeline by grouping together a set of related jobs, crawlers, and triggers into one logical run unit. Many customers using AWS Glue workflows start by defining the pipeline using the AWS Management Console and then move on to monitoring and troubleshooting using either the console, AWS APIs, or the AWS Command Line Interface (AWS CLI).

Architecture Diagram

Alt text

Deploy solution from codepipeline stack

AWS CLI command to deploy codepipeline stack:

aws cloudformation deploy \
--stack-name codepipeline-covid19 \
--template-file cloudformation/codepipeline-stack.yml \
--capabilities CAPABILITY_NAMED_IAM \
--no-fail-on-empty-changeset \
--region <AWS_REGION>

Command to zip source code:

zip -r source.zip . -x images/\* *.history* *.git* *.DS_Store*

AWS CLI command to upload source code:

aws s3 cp source.zip s3://covid19-codepipeline-source-<AWS_ACCOUNT_ID>-<AWS_REGION>

Note: Uploading the source code will initiate an execution of the pipeline named: DeployPipelineForGlueWorkflow-codepipeline-covid19

Deploy solution from your machine without codepipeline

AWS CLI command to deploy glue workflow stack:

aws cloudformation deploy \
--stack-name glue-covid19 \
--template-file cloudformation/glue-workflow-stack.yml \
--capabilities CAPABILITY_NAMED_IAM \
--no-fail-on-empty-changeset \
--region <AWS_REGION>

AWS CLI command to copy python scripts to bucket/glue-scripts:

aws s3 cp src s3://covid19-dataset-<AWS_ACCOUNT_ID>-<AWS_REGION>/glue-scripts/ --recursive

Running the workflow

The workflow runs automatically at 8:00 AM UTC. To start the workflow manually, you can use either the AWS CLI or the AWS Glue console.

AWS CLI command to start glue workflow:

aws glue start-workflow-run --name Covid_19 --region <AWS_REGION>

To start the workflow on the AWS Glue console, on the Workflows page, select your workflow and choose Run on the Actions menu. Alt text

The following screenshot shows a visual representation of the workflow as a graph with your run details. Alt text

Interested in Contributing?

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

provision-codepipeline-glue-workflows's People

Contributors

amazon-auto avatar nuatu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.