GithubHelp home page GithubHelp logo

cloudformationemr's Introduction

Quick Deploy

  1. Create default roles:
aws emr create-default-roles
  1. Deploy template:

Basic Template

aws cloudformation create-stack --stack-name emr --template-body file://./emr-basic-template.yml

Complex Template

aws cloudformation create-stack --stack-name emr-testing \
--template-body file://./emr-complex-template.yml --parameters \
ParameterKey=keyPair,ParameterValue=gen_key_pair \
ParameterKey=subnetID,ParameterValue=subnet-6d482025 \
ParameterKey=clusterName,ParameterValue=emr-testing \
ParameterKey=taskInstanceCount,ParameterValue=0 \
ParameterKey=coreInstanceType,ParameterValue=m1.medium \
ParameterKey=taskInstanceType,ParameterValue=m1.medium \
ParameterKey=emrVersion,ParameterValue=emr-5.3.0 \
ParameterKey=environmentType,ParameterValue=test \
ParameterKey=masterInstanceType,ParameterValue=m1.medium \
ParameterKey=s3BucketBasePath,ParameterValue=emr-test-logs-mm \
ParameterKey=terminationProtected,ParameterValue=false \
ParameterKey=taskBidPrice,ParameterValue=0 --region us-east-1
  1. Wait for ResourceStatus = COMPLETE
while true; do aws cloudformation describe-stack-events --stack-name emr | grep ResourceStatus; sleep 2; done

Update

aws cloudformation update-stack --stack-name emr-testing \
--use-previous-template --parameters \
ParameterKey=keyPair,UsePreviousValue=true \
ParameterKey=subnetID,UsePreviousValue=true \
ParameterKey=clusterName,UsePreviousValue=true \
ParameterKey=taskInstanceCount,UsePreviousValue=true \
ParameterKey=coreInstanceType,UsePreviousValue=true \
ParameterKey=taskInstanceType,UsePreviousValue=true \
ParameterKey=emrVersion,UsePreviousValue=true \
ParameterKey=environmentType,UsePreviousValue=true \
ParameterKey=masterInstanceType,UsePreviousValue=true \
ParameterKey=s3BucketBasePath,ParameterValue=emr-test-logs-mm \
ParameterKey=terminationProtected,UsePreviousValue=true \
ParameterKey=taskBidPrice,UsePreviousValue=true --region us-east-1

TearDown

aws cloudformation delete-stack --stack-name emr

Submitting a step

  1. Configure a VPC Endpoint for Amazon S3. See the following blog post for instructions on how to do this if unsure: https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3/
  2. Create log,source, and destination buckets:
aws s3api create-bucket --bucket emr-test-logs-<initials> --region us-east-1
aws s3api create-bucket --bucket emr-test-source-<initials> --region us-east-1
aws s3api create-bucket --bucket emr-test-dest-<initials> --region us-east-1
  1. Upload the 'spark-test-cluster.py' and 'test-data.txt' files to the source bucket.
  2. Submit the step to the cluster:
aws emr list-clusters | grep -i waiting -A 7
aws emr add-steps --cluster-id j-xxxxxxxx --steps Type=spark,Name=SparkWordCountApp,Args=[--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=false,--num-executors,2,--executor-cores,1,--executor-memory,512m,s3a://<source-bucket>/spark-test-cluster.py,s3a://<source-bucket>/test-data.txt,s3a://<dest-bucket>/],ActionOnFailure=CONTINUE

Adjust the num-executors, executor-cores, and executor-memory as if optimal for your cluster/job.

Configuring Zeppelin

  1. Set up an ssh tunnel to Zeppelin's service port:
ssh -i gen-keypair.pem -L 8890:localhost:8890 hadoop@ec2-[redacted].compute-1.amazonaws.com -N

References

cloudformationemr's People

Watchers

Aravinth Palanivelrajan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.