GithubHelp home page GithubHelp logo

sunny5156 / docker-bigdata-playground Goto Github PK

View Code? Open in Web Editor NEW

This project forked from adavarski/docker-bigdata-playground

0.0 2.0 0.0 34 KB

Running big data platform using docker and docker-compose. Containers for Hadoop, Hive, Impala, Zookeeper and Postgres.

License: Do What The F*ck You Want To Public License

Makefile 2.31% Dockerfile 37.82% Shell 50.35% Java 9.52%

docker-bigdata-playground's Introduction

Big Data Docker Containers

Docker containers for running big data platform. Containers for Hadoop NameNode, Hadoop DataNodes, Hive, Impala, Zookeeper and Postgres.

Building Containers

All containers are build from docker-compose files, but docker-compose does not support building containers from a base image. A Makefile has been included to build the containers.

Build all Containers

make build

Build Individual Container

make build-hive

Running Containers

All containers can be run using docker-compose The -p option is used to specify the docker network for the containers.

docker-compose -p bigdata-net up

Individual containers can be run by referencing the container name. This is typically not recommended however as there are dependencies between a number of the containers.

docker-compose -p bigdata-net up postgres

Accessing Containers

Use docker-compose to access containers by name.

docker-compose -p bigdata-net exec impala bash

Container Structure

Adding Data to the HDFS

  1. Copy files to the NameNode container.
docker cp <data-file> <hadoop-container-id>:/
  1. Enter the NameNode Container
docker-compose -p bigdata-net exec namenode bash
  1. Create a directory in the HDFS for the files
hdfs dfs -mkdir -p /user/data/
  1. Add the files to the HDFS directory
hdfs dfs -put <data-file> /user/data/

Running Hive Queries

Using beeline

  1. From the Hive container, run the beeline CLI
beeline
  1. Connect to HiveServer2
!connect jdbc:hive2://localhost:10000
  1. Run Queries
show databases;

Using JDBC with Maven

  1. From the Hive container, navigate to the directory containing the pom.xml file and project file
cd jdbc
  1. Run the Maven package command
mvn package
  1. Run the Java Project
cd target/
java -jar hive-jdbc-example-1.0-jar-with-dependencies.jar

Running Impala Queries

Using Impala Shell

  1. Start the Impala Shell
impala-shell -i localhost
  1. Run Queries
show databases;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.