GithubHelp home page GithubHelp logo

hdb-docker's Introduction

hdb-docker

This is intended for dev/test purposes only. If you want a quick way to test a theory or familiarize yourself with core Apache HAWQ (incubating) functionality without all the other eco-system components, this is a good starting point.

This repository contains a lean and quick Apache HAWQ Docker Image that sets up 1 NameNode and 3 DataNode containers with Apache HAWQ (incubating) installed.

Apache HAWQ (incubating) has the following features enabled:

-ORCA -PL/python -PL/R -MADLib

External data Query Framework or PXF is also installed and enabled.

Prerequisites

  1. Installed and Running Docker on linux or docker-machine on OSX
  2. At least > 8GB RAM
  3. At least > 16GB Disk Space

Option-1: BUILD Instructions Simple

git clone https://github.com/jpatel-pivotal/hdb-docker.git
cd hdb-docker/hdb2
make run

Note: This script will configure 1 NN, 3 DNs using a pre-built image that it will download from Docker Hub. The NN is Apache HAWQ (incubating) Master and DNs are Apache HAWQ (incubating) segments.

Once you have 4 containers running, connect to the NN container as the output of make run shows. tail -f ~/start_hdb.log to see Apache HAWQ (incubating) processes starting up. It usually takes a few minutes before Apache HAWQ (incubating) and PXF processes are online. Alternatively you could check for running postgres and pxf using ps.

Clean up and start the Simple environment up again

To start over, use the commands below:

make stop
make clean
make distclean
make run

Connect to a container

To connect to any container use the command below:

docker exec -it <container-name> bash

Replace in the command above with the name of the container you want to connect to.

Connecting to Apache HAWQ (incubating)

Once processes have started up you can use the commands below to connect to the NN container and then running SQL queries in Apache HAWQ (incubating)

docker exec -it centos7-namenode bash
[gpadmin@centos7-namenode data]$ psql
gpadmin=# select version();

Option-2: BUILD Instructions Advanced

If you want to have Apache HAWQ (incubating) along with Spring Cloud Data Flow and Zeppelin, then follow the steps below. Keep in mind that this will spin up a total of 7 containers (so 3 more containers than the simple case). The Advanced version uses more resources on your Docker host so provision accordingly.

git clone https://github.com/jpatel-pivotal/hdb-docker.git
cd hdb-docker/hdb2
make run ZEPP=1 SCDF=1

Note: This script will configure 1 NN, 5 DNs using a pre-built image that it will download from Docker Hub. The NN is Apache HAWQ (incubating) Master and DN[1-3] are Apache HAWQ (incubating) segments.

Once you have 4 containers running, connect to the NN container as the output of make run shows. tail -f ~/start_hdb.log to see Apache HAWQ (incubating) processes starting up. It usually takes a few minutes before Apache HAWQ (incubating) and PXF processes are online. Alternatively you could check for running postgres and pxf using ps.

You should also have 1 Zeppelin, 1 kafka and 1 Spring Cloud Data Flow containers.

Clean up and start the Advanced environment up again

To start over with a clean environment if you ran the advanced build commands, use the commands below:

make stop ZEPP=1 SCDF=1
make clean ZEPP=1 SCDF=1
make distclean ZEPP=1 SCDF=1
make run ZEPP=1 SCDF=1

Restart Advanced environemnt up again

To restart the containers if you ran the advanced build commands, use the command below:

make restart ZEPP=1 SCDF=1

Connecting to Zeppelin UI

If you ran the advanced build commands, then you can point your browser to the URL below to connect to Zeppelin's UI

http://<IP or FQDN of Docker HOST>:9080

There are interpreters that are pre-installed for zeppelin. You must create an instance of the postgresql interpreter and configure it in order to use %psql binding in your notebooks. All you have to do is replace localhost in the postgresql.url property with the name of the NN (centos7-namenode). All other defaults should work out of the box.

The path to install-interpreter script is: /usr/local/zeppelin-*/bin/install-interpreter.sh Below is a list of the pre-installed interpreters:

  1. shell
  2. python
  3. postgresql
  4. file
  5. angular
  6. md
  7. jdbc
  8. elasticsearch
  9. hbase

For details on Zeppelin, please follow the documentation.

Connecting to Spring Cloud Data Flow dashboard and starting up the Shell

If you ran the advanced build commands, then you can point your browser to the url below to connect to the SCDF Dashboard.

http://<IP or FQDN of Docker HOST>:9393/dashboard

The SCDF shell can be invoked after connecting to the SCDF container using the command below:

java -jar /data/spring-cloud-dataflow-shell-*.BUILD-SNAPSHOT.jar

For details on how to use the shell, please refer to SCDF documentation.

Thank You

The building blocks for this code are from wangzw's Docker effort that can be found at: https://hub.docker.com/r/mayjojo/hawq-devel/ I enhanced it and added in components like Apache HAWQ (incubating), PXF, PL/* languages, MADLib, Zeppelin, Spring Cloud Data Flow, etc.

hdb-docker's People

Contributors

jemishp avatar

Stargazers

Igor Gallingani avatar Smrutiranjan Sahu avatar Hari Sekhon avatar Simon Gao avatar Jemish Patel avatar Derek Comingore avatar jamroks avatar Timothy Spann avatar Kavinder Dhaliwal avatar Oak Barrett avatar

Watchers

Igor Gallingani avatar Josiah avatar

hdb-docker's Issues

Needs a make restart option

Couple of times I have had the docker daemon stop and restart. The containers are already created then, but just need to be re-run. Current method is to start them manually.

allow the automated passing of sysctl params

Have a file that i can stick sysctl params in and then pass via the command line to the make command (or autoread it) so that I can add params to get hawq working in various enviroments. right now i have to make manual additiadditions to the make fiel

Add and option to report when Services are online

With the new additions, it takes awhile to get the system up and running. would be nice to have a script or a make option (not a great fit...but still an option) to report the status of all the services. somthing like make status..... and then come back with each container and the service status within. So return something like

namenode,datanode1ec:
hawq: online
pxf online
zepplin container
zeppelin: online

you get the idea.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.