GithubHelp home page GithubHelp logo

tempbottle / flume-ng-kafka-sink Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thilinamb/flume-ng-kafka-sink

0.0 1.0 0.0 1.1 MB

An Apache Flume Sink implementation to publish data to Apache Kafka

License: Apache License 2.0

Java 100.00%

flume-ng-kafka-sink's Introduction

Flume-NG-Kafka-Sink

This is a Flume Sink implementation that can publish data to a Kafka topic. The objective is to integrate Flume with Kafka so that pull based processing systems such as Apache Storm can process the data coming through various Flume sources such as Syslog.

This is now a part of the official Flume distribution (from v1.6 onwards) along with significant improvements.

Updates

Example Usecase

Realtime Syslog processing architecture using Apache Flume, Apache Kafka and Apache Storm. Realtime Syslog Processing

Dependency Versions

  • Apache Flume - 1.5.0
  • Apache Kafka - 0.8.1.1

Prerequisites

Building the project

Apache Maven is used to build the project. This page contains the download links and an installation guide for various operating systems.

Issue the command: > mvn clean install

This will compile the project and the binary distribution(flume-kafka-sink-dist-x.x.x-bin.zip) will be copied into '${project_root}/dist/target' directory.

Setting up

  1. Build the project as per the instructions in the previous subsection.
  2. Unzip the binary distribution(flume-kafka-sink-dist-x.x.x-bin.zip) inside ${project_root}/dist/target.
  3. There are two ways to include this custom sink in Flume binary installation.

Recommended Approach

  • Create a new directory inside plugins.d directory which is located in ${FLUME_HOME}. If the plugins.d directory is not there, go ahead and create it. We will call this new directory that was created inside plugins.d 'kafka-sink'. You can give it any name depending on the naming conventions you prefer.
  • Inside this new directory (kafka-sink) create two subdirectories called lib and libext.
  • You can find the jar files for this sink inside the lib directory of the extracted archive. Copy flume-kafka-sink-impl-x.x.x.jar into the plugins.d/kafka-sink/lib directory. Then copy the rest of the jars into the plugins.d/kafka-sink/libext directory.

This is how it'll look like at the end.

${FLUME_HOME}
 |-- plugins.d
 		|-- kafka-sink
 			|-- lib
   				|-- flume-kafka-sink-impl-x.x.x.jar
 			|-- libext
   				|-- kafka_x.x.-x.x.x.x.jar
   				|-- metrics-core-x.x.x.jar
   				|-- scala-library-x.x.x.jar

More details can be found in the Flume user guide.

OR

Quick and Dirty Approach

  • Copy the jar files inside the lib directory of extracted archive into ${FLUME_HOME}/lib.

Configuration

Following parameters are supported at the moment.

  • type

    • The sink type. This should be set as com.thilinamb.flume.sink.KafkaSink.
  • topic[optional]

    • The topic in Kafka to which the messages will be published. If this topic is mentioned, every message will be published to the same topic. If dynamic topics are required, it's possible to use a preprocessor instead of a static topic. It's mandatory that either of the parameters topic or preprocessor is provided, because the topic cannot be null when publishing to Kafka. If none of these parameters are provided, the messages will be published to a default topic called default-flume-topic.
  • preprocessor[optional]

    • This is an extension point provided support dynamic topics and keys. Also it's possible to use it to support message modification before publishing to Kafka. The full qualified class name of the preprocessor implementation should be provided here. Refer the next subsection to read more about preprocessors. If a preprocessor is not configured, then a static topic should be used as explained before. And the messages will not be keyed. In a primitive setup, configuring a static topic would suffice.
  • Kafka Producer Properties

    • These properties are used to configure the Kafka Producer. Any producer property supported by Kafka can be used. The only requirement is to prepend the property name with the prefix kafka.. For instance, the metadata.broker.list property should be written as kafka.metadata.broker.list. Please take a look at the sample configuration provided in the conf directory of the distribution.

Implementing a preprocessor

Implementing a custom preprocessor is useful to support dynamic topics and keys. Also they support message transformations. The requirement is to implement the interface com.thilinamb.flume.sink.MessagePreprocessor. The java-docs of this interface provides a detailed description of the methods, parameters, etc. There are three methods that needs to be implemented. The method names are self explainatory.

  • public String extractKey(Event event, Context context)
  • public String extractTopic(Event event, Context context)
  • public String transformMessage(Event event, Context context)

The class 'com.thilinamb.flume.sink.example.SimpleMessagePreprocessor' inside the 'example' module is an example implementation of a preprocessor.

After implementing the preprocessor, compile it into a jar and add into the Flume classpath with the rest of the jars (copy to libext if you are using the plugins.d directory or copy it to ${FLUME_HOME}\lib if you are using the other approach) and configure the preprocessor parameter with its fully qualified classname. For instance;

a1.sinks.k1.preprocessor = com.thilinamb.flume.sink.example.SimpleMessagePreprocessor

Questions and Feedback

Please file a bug or contact me via email with respect to any bug you encounter or any other feedback.

flume-ng-kafka-sink's People

Contributors

thilinamb avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.