sunny5156 / docker-bigdata-playground Goto Github PK

View Code? Open in Web Editor NEW

Running big data platform using docker and docker-compose. Containers for Hadoop, Hive, Impala, Zookeeper and Postgres.

License: Do What The F*ck You Want To Public License

Makefile 2.31% Dockerfile 37.82% Shell 50.35% Java 9.52%

docker-bigdata-playground's Introduction

Big Data Docker Containers

Docker containers for running big data platform. Containers for Hadoop NameNode, Hadoop DataNodes, Hive, Impala, Zookeeper and Postgres.

Building Containers

All containers are build from docker-compose files, but docker-compose does not support building containers from a base image. A Makefile has been included to build the containers.

Build all Containers

make build

Build Individual Container

make build-hive

Running Containers

All containers can be run using docker-compose The -p option is used to specify the docker network for the containers.

docker-compose -p bigdata-net up

Individual containers can be run by referencing the container name. This is typically not recommended however as there are dependencies between a number of the containers.

docker-compose -p bigdata-net up postgres

Accessing Containers

Use docker-compose to access containers by name.

docker-compose -p bigdata-net exec impala bash

Container Structure

Adding Data to the HDFS

Copy files to the NameNode container.

docker cp <data-file> <hadoop-container-id>:/

Enter the NameNode Container

docker-compose -p bigdata-net exec namenode bash

Create a directory in the HDFS for the files

hdfs dfs -mkdir -p /user/data/

Add the files to the HDFS directory

hdfs dfs -put <data-file> /user/data/

Running Hive Queries

Using beeline

From the Hive container, run the beeline CLI

beeline

Connect to HiveServer2

!connect jdbc:hive2://localhost:10000

Run Queries

show databases;

Using JDBC with Maven

From the Hive container, navigate to the directory containing the pom.xml file and project file

cd jdbc

Run the Maven package command

mvn package

Run the Java Project

cd target/
java -jar hive-jdbc-example-1.0-jar-with-dependencies.jar

Running Impala Queries

Using Impala Shell

Start the Impala Shell

impala-shell -i localhost

Run Queries

show databases;

Recommend Projects

sunny5156 / docker-bigdata-playground Goto Github PK

docker-bigdata-playground's Introduction

Big Data Docker Containers

Building Containers

Running Containers

Accessing Containers

Container Structure

Adding Data to the HDFS

Running Hive Queries

Running Impala Queries

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs