GithubHelp home page GithubHelp logo

tzolov / vagrant-pivotalhd Goto Github PK

View Code? Open in Web Editor NEW
23.0 12.0 18.0 416 KB

Use Vagrant and Ambari Blueprint API to install PivotalHD 3.0 (or Hortonworks HDP2.x) Hadoop cluster with HAWQ 1.3 (SQL on Hadoop) and Spring XD 1.2

License: Apache License 2.0

Shell 91.45% Python 8.55%

vagrant-pivotalhd's Introduction

Multi-VMs PivotalHD3.0 (or Hortonworks HDP2.x) Hadoop Cluster with HAWQ and SpringXD

This project leverages Vagrant and Apache Ambari to create multi-VMs PivotalHD 3.0 or Hortonworks HDP2.x Hadoop cluster including HAWQ 1.3 (SQL on Hadoop) and Spring XD 1.2.

alt text

The logical structure of the cluster is defined in a Blueprint. Related Host-Mapping defines how the blueprint is mapped into physical machines. The Vagrantfile script provisions Virtual Machines (VMs) for the hosts defined in the Host-Mapping and with the help of the Ambari Blueprint API deploys theBlueprint in the cluster. Vagrant supports PivotalHD3.0 (PHD) and Hortonworks 2.x (HDP) blueprint stacks.

The default All-Services-Blueprint creates four virtual machines — one for Apache Ambari and three for the Pivotal HD cluster where Apache Hadoop® (HDFS, YARN, Pig, Zookeeper, HBase), HAWQ (SQL-on-Hadoop) and SpringXD are installed.

Prerequisite

  • From a hardware standpoint, you need 64-bit architecture, the default blueprint requires at least 16GB of physical memory and around 120GB of free disc space (you can configure with only 24GB of disc space but you will not be able to install all Pivotal services together.
  • Install Vagrant (1.7.2+).
  • Install VirtualBox or VMware Fusion (note that VMWare Fusion requires paid Vagrant license).

Environment Setup

  • Clone this project
git clone https://github.com/tzolov/vagrant-pivotalhd.git
  • Follow the Packages download instructions to collect all required tarballs and store them inside the /packages subfolder.
  • Edit the Vagrantfile BLUEPRINT_FILE_NAME and HOST_MAPPING_FILE_NAME properties to select the Blueprint/Host-Mapping pair to deploy. All blueprints and mapping files are in the /blueprint subfolder. By default the 4 nodes, All-Services blueprint is used.

Create Hadoop cluster

From the top directory run

vagrant up --provider virtualbox

Depends on the blueprint stack either PivotalHD or Hortonworks clusters will be created. The default blueprint/host-mapping will create 4 Virtual Machines. When the vagrant up command returns, the VMs are provisioned, the Ambari Server is installed and the cluster deployment is in progress. Open the Ambari interface to monitor the deployment progress:

http://10.211.55.100:8080

(username: admin, password: admin)

Vagrant Configuration Properties

The following Vagrantfile configuration properties can be used to customize a cluster deployment. For instructions how to create a custom Blueprint or Host-Mapping read the blueprints section.

Property Description Default Value
BLUEPRINT_FILE_NAME Specifies the Blueprint file name to deployed. File must exist in the /blueprints subfolder. phd-all-services-blueprint.json
HOST_MAPPING_FILE_NAME Specifies the Host-Mapping file name to deployed. File must exist in the /blueprints subfolder. 4-node-all-services-hostmapping.json
CLUSTER_NAME Sets the cluster name as it will appear in Ambari CLUSTER1
VM_BOX Vagrant box name to use. Tested options are:
- bigdata/centos6.4_x86_64 - 40G disk,
- bigdata/centos6.4_x86_64_small - just 8G of disk space and
- chef/centos-6.6 - CentOS6.6 box.
chef/centos-6.6
AMBARI_NODE_VM_MEMORY_MB Memory (MB) allocated for the Ambari VM 768
PHD_NODE_VM_MEMORY_MB Memory (MB) allocated for every PHD VM 2048
AMBARI_HOSTNAME_PREFIX Set the Ambari host name prefix. The suffix is fixed to '.localdomain'.Note: THE FQDN NAME SHOULD NOT be in the phd[1-N].localdomain range. ambari
DEPLOY_BLUEPRINT_CLUSTER Set TRUE to deploy a cluster defined by BLUEPRINT_FILE_NAME and HOST_MAPPING_FILE_NAME. Set to FALSE if you prefer to install the cluster with the Ambari wizard. TRUE

vagrant-pivotalhd's People

Contributors

dbbaskette avatar falsamawi-pivotal avatar tzolov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vagrant-pivotalhd's Issues

404 not found

I get the following error. Is there any issue with the Vagrant box ?

vagrant up

CLUSTER NAME: PHD30C1
BLUEPRINT NAME: all-services-blueprint
STACK: PHD-3.0
BLUEPRINT FILE: phd-all-services-blueprint.json
HOST-MAPPING FILE: 4-node-all-services-hostmapping.json
Ambari Provision Script: provision/phd_install_ambari.sh
Number of cluster nodes (excluding Ambari): 3
Bringing machine 'phd1' up with 'virtualbox' provider...
Bringing machine 'phd2' up with 'virtualbox' provider...
Bringing machine 'phd3' up with 'virtualbox' provider...
Bringing machine 'ambari' up with 'virtualbox' provider...
==> phd1: Box 'bigdata/centos6.4_x86_64' could not be found. Attempting to find and install...
phd1: Box Provider: virtualbox
phd1: Box Version: >= 0
==> phd1: Loading metadata for box 'bigdata/centos6.4_x86_64'
phd1: URL: https://atlas.hashicorp.com/bigdata/centos6.4_x86_64
==> phd1: Adding box 'bigdata/centos6.4_x86_64' (v1.0.0) for provider: virtualbox
phd1: Downloading: https://atlas.hashicorp.com/bigdata/boxes/centos6.4_x86_64/versions/1.0.0/providers/virtualbox.box
An error occurred while downloading the remote file. The error
message, if any, is reproduced below. Please fix this error and try
again.

The requested URL returned error: 404 Not Found

Upgrade to PHD3.0.1 and HAWQ 1.3.1

PHD and HAWQ have released a minor maintenance releases. Update the phd provisioning script and related packages documentation to the latest package versions.

Add MADlib for HAWQ

MADlib is a unique value add from Pivotal to the SQL on Hadoop market, would be great if it ran out of the box on top of this. Maybe have a flag to install it optionally.

Add Spark 1.4

It would be really great if we could get a working Spark 1.4 configuration into this. Please use a Spark version that was compiled against Hive to be able to use features such as persisting tables.

Make the node IP range configurable

Currently the IPs assigned to the VMs are fixed to the range [10.211.55.100, 10.211.55.xxx).
Sometime this IP range could be used by the Host or another VMs. It should be possible to set the IP prefix in the Vagrant configuration properties

PL/R support

(Optionally) Provide PL/R support out of the box.

Support Docker

This will remove the need for VM overhead on Linux and will only require one VM instead of multiple VMs on OS X.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.