GithubHelp home page GithubHelp logo

willie-engelbrecht / ingestnifitophoenix Goto Github PK

View Code? Open in Web Editor NEW
2.0 0.0 1.0 741 KB

Ingesting data into Phoenix with NiFi, use Excel and Zeppelin to extract the data

License: Apache License 2.0

ingestnifitophoenix's Introduction

Introduction

Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:

  • the power of standard SQL and JDBC APIs with full ACID transaction capabilities and
  • the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store

Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce.

Read more about the Apache Phoenix project

The demo below will outline how you can integrate HDF (NiFi) with Phoenix to source data and push to Phoenix for BI Analysis with an ODBC tool (Excel). Any tool that is capable of using an ODBC connection will be able to source this data from Phoenix.

Setup

Create your table in Phoenix, by running the command phoenix-sqlline on your terminal:

[root@hdp3 ~]# phoenix-sqlline
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix: none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:
Connected to: Phoenix (version 5.0)
Driver: PhoenixEmbeddedDriver (version 5.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
149/149 (100%) Done
Done
sqlline version 1.2.0

0: jdbc:phoenix:> !sql create table employees (emp_no integer primary key, birth_date varchar, first_name varchar, last_name varchar, gender varchar, hire_date varchar, random_nr integer);
No rows affected (0.819 seconds)

You can do the same in Hive3, for the streaming portion:

[root@hdp3 ~]$ hive
Connecting to jdbc:hive2://hdp3.1.0-0.home.local:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
19/04/23 15:54:31 [main]: INFO jdbc.HiveConnection: Connected to hdp3.1.0-0.home.local:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive

0: jdbc:hive2://hdp3.1.0-0.home.local:2181/de> create table employees (emp_no int, birth_date string, first_name string, last_name string, gender string, hire_date string);
INFO  : OK
No rows affected (0.306 seconds)

Also create your Kafka topic, using the correct Hostname/IP address of your machine:

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --topic topic_data --replication-factor 1 --partitions 2 --zookeeper localhost:2181

Import the NiFi flow

Import the example NiFi flow (demo.xml) into your instance of NiFi:

alt text

Load select the file on your computer and upload:

alt text

Click on the Template button to load the example to your NiFi instance:

alt text

And then load a copy to your canvas:

alt text

Once Imported, update the three Kafka processors, and set the correct Hostname/IP of your Kafka broker:

alt text

Double click on PutSQL, and update the Hostname/IP of your Phoenix server in the DBCPConnectionPool:

alt text

Also update the Hostname/IP of your HiveMetastore URI in the PutHive3Streaming processor:

alt text

You can now start your flows in the green box, and see the data being push to the Kafka topic:

alt text

Start the ingest from Kafka to Phoenix, and Kafka to Hive3Streaming:

alt text

alt text

While it's running, you can go back to the commandline and test the row count:

sqlline version 1.2.0
0: jdbc:phoenix:> select count(*) from employees;
+-----------+
| COUNT(1)  |
+-----------+
| 30        |
+-----------+
1 row selected (0.086 seconds)

Another example of the data from Phoenix:

alt text

Using ODBC, you can use a tool like Excel to load the data for further analyses:

alt text

Pick wich DSN to use:

alt text

Pick your table in the Phoenix ODBC connection:

alt text

Now you have imported your data:

alt text

You can do the same from Zeppelin, by querying %jdbc(Phoenix), pulling and visualising the data:

Example 1:

alt text

Example 2:

alt text

ingestnifitophoenix's People

Contributors

willie-engelbrecht avatar

Stargazers

Wahyudi Prasidhatama avatar Madse avatar

Forkers

ajbd2106

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.