GithubHelp home page GithubHelp logo

spark-zeppelin-emr-tf's Introduction

Spark and Zeppelin on AWS EMR

Note At the current time this is intended as instructive and still some rough patches to work out. Use at your own risk.

This is a composition of three modules:

  • terraform-aws-modules/vpc/aws for creating a VPC with a fully private subnet
  • jetbrains-infra/terraform-aws-bastion-host for creating a jumpbox into the private subnets
  • the module defined here

If you already have a VPC and jumpbox set up then you should just use the modules/spark-zeppelin-emr module rather than using the main.tf.

Prerequisites

This module requires Terraform version 0.12.23.

The usage of a package manager may be advisable to enable you to use different versions of Terraform for other purposes, but this is not required. For those interested in using a package manager then asdf has worked well for the author in the past.

Manual configuration

  • The 3rd-party module for creating the bastion does not export much information about the instance. Since the EMR cluster is configured to only allow access via an SSH tunnel from the bastion, it needs to have the security group of the bastion manually passed in. This means that currently the workflow is:
    • comment out the spark-zeppelin-emr module in main
    • terrafom apply
    • uncomment the spark-zeppelin-emr module
    • update the bastion-security-group appropriately
    • terraform apply
  • The default IAM roles and instance profiles used by EMR might not be created until a cluster has been created manually. If you get an error then it's probably best to use the wizard in AWS console to create those before using this script.

Execution

TF_VAR_key_name="<some key in ec2>" \
 terraform apply

SSH configuration

To access the EMR cluster over SSH -- which is required to tunnel to the UIs hosted on EMR such as Zeppelin -- then a configuration like the following will work:

Host bastion
  HostName <public IP or host name of bastion>
  User centos
  IdentityFile ~/path/to/private_key.pem
  ForwardAgent yes

Host emr-master
  HostName <private IP of the EMR master node>
  User hadoop
  ProxyJump bastion
  IdentityFile ~/path/to/private_key.pem

The bastion and emr-master Host values are arbitrary aliases and can be changed as you see fit so long as the ProxyJump reference is also updated.

This configuration will enable SSHing into the EMR master with just:

ssh emr-master

This handles:

  • logging into the bastion with the correct user name and key
  • instructing the bastion to forward the SSH agent
  • jumping to the emr-master node with the correct user name and key

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.