GithubHelp home page GithubHelp logo

renjithraju / kafka-connect-gcp-bigtable Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sanjuthomas/kafka-connect-gcp-bigtable

0.0 1.0 0.0 69 KB

Kafka Sink Connect to GCP Bigtable

Home Page: http://sanjuthomas.com

License: MIT License

Java 100.00%

kafka-connect-gcp-bigtable's Introduction

Kafka Sink Connect GCP Bigtable

Apache Kafka Sink only Connect can be used to stream messages from Apache Kafka to Google Cloud Platform (GCP) wide column store Bigtable.

What is Apache Kafka?

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. For more details, please refer to Apache Kafka home page.

What is GCP Bigtable?

Bigtable is a compressed, high performance, proprietary data storage system built on Google File System, Chubby Lock Service, SSTable and a few other Google technologies. On May 6, 2015, a public version of Bigtable was made available as a service in the Google Cloud Platform. For more details, please refer to GCP Bigtable home page.

High Level Architecture

This project leverage bigtable-client-core library (NO HBase) to stream data to GCP Bigtable. bigtable-client-core internally use the gRPC framework to talk to GCP Bigtable.

Kafka Connect GCP Bigtable

Prerequisites

Apache ZooKeeper and Apache Kafka installed and running in your machine. Please refer to respective sites to download and start ZooKeeper and Kafka. You would also need Java version 8 or above.

Tested Software Versions

Software Version Note
Java 1.8.0_161 You may use java 8 or above.
Kafka 2.11-2.1.0 Please refer
Zookeeper 3.4.13 Please refer
bigtable-client-core 1.8.0 Please refer
Kafka connect-api 2.1.0 Please refer
grpc-netty-shaded 1.17.1 Please refer

Configurations

bigtable-sink.properties

Property Value Data Type Description
name bigtable-sink String Name of the Sink Connect.
connector.class BigtableSinkConnector String Simple name of the Connector Class.
tasks.max 1 Number Numbers of tasks.
topics demo-topic String Comma separated list of topics.
topics.config.files.location kafka_home/config String There should be one yml file per topic.

demo-topic.yml (one yml file per topic)

Property Value Data Type Description
keyFile: /home/keys/demo-instance-key.json String GCP Connect Key File. This is a topic level configuration because you could subscribe from multiple topics and messages from one topic may go to a table in instance A and messages from another topic may go to a table in instance B
project: demo-project String Name of the GCP Project
instance: demo-instance String Name of GCP Bigtable instance
table: demo-table String Name of GCP Bigtable table
transformer: kafka.connect.gcp.transform.JsonEventTransformer String Transformer class to transform the message to Bigtable writable row. You may provide your own implementation.
keyQualifiers: - exchange
- symbol
Array Bigtable row key qualifier. Configured element names would be used to construct the row keys.
keyDelimiter: - String Delimiter to use if there are more than one element to construct row key.
families: - data
- metadata
Array Column families in the Bigtable table. This configuration will be used by the transformer.
familyQualifiers: - data:
 - exchange
 - symbol
 - name
 - sector
- metadata:
 - event_time
 - create_time
 - processing_time
 - topic
Array Column family to columns mapping.

Constraints

The current configuration system supports streaming messages from a given topic to a given table. You can subscribe any number of topics, but a topic can be pointed to one and only table. Say for example, if you subscribed from topic named demo-topic, you should have yml file named demo-topic.yml. That yml file contains all the configuration requires to transform and write data into Bigtable.

As of today, we have transformer support for JSON Messages. I'm planning to add the Avro Messages transformer in the next version.

How to deploy the connector?

This is maven project. To create an uber jar, execute the following maven goals.

mvn clean compile package shade:shade install

Copy the artifact kafka-connect-gcp-bigtable-1.0.0.jar to kakfa_home/lib folder.

Copy the bigtable-sink.properties file into kafka_home/config folder. Update the content of the property file according to your environment.

Alternatively, you may keep the kafka-connect-gcp-bigtable-1.0.jar in another directory and export that directory into Kafka class path before starting the connector.

How to start connector in stand-alone mode?

Open a shell prompt, move to kafka_home and execute the following.

bin/connect-standalone.sh config/bigtable-connect-standalone.properties config/bigtable-sink.properties

How to start connector in distribute mode?

Open a shell prompt, change your working directory to kafka_home and execute the following.

bin/connect-distributed.sh config/bigtable-connect-distributed.properties config/bigtable-sink.properties

Questions?

Either create an issues in this project or send it to [email protected]. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.