GithubHelp home page GithubHelp logo

isabella232 / layer-apache-flume-twitter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from juju-solutions/layer-apache-flume-twitter

0.0 0.0 0.0 56 KB

The Apache Flume twitter agent layered charm

License: Apache License 2.0

Python 100.00%

layer-apache-flume-twitter's Introduction

Overview

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. Learn more at flume.apache.org.

This charm provides a Flume agent designed to process tweets from the Twitter Streaming API and send them to the apache-flume-hdfs agent for storage in the shared filesystem (HDFS) of a connected Hadoop cluster. This leverages the TwitterSource jar packaged with Flume. Learn more about the 1% firehose.

Prerequisites

The Twitter Streaming API requires developer credentials. You'll need to configure those for this charm. Find your credentials (or create an account if needed) here.

Create a secret.yaml file with your Twitter developer credentials:

flume-twitter:
    twitter_access_token: 'YOUR_TOKEN'
    twitter_access_token_secret: 'YOUR_TOKEN_SECRET'
    twitter_consumer_key: 'YOUR_CONSUMER_KEY'
    twitter_consumer_secret: 'YOUR_CONSUMER_SECRET'

Usage

This charm is uses the Hadoob base layer and the HDFS interface to pull its dependencies and act as a client to a Hadoop namenode:

You may manually deploy the recommended environment as follows:

juju deploy apache-hadoop-namenode namenode
juju deploy apache-hadoop-resourcemanager resourcemgr
juju deploy apache-hadoop-slave slave
juju deploy apache-hadoop-plugin plugin

juju add-relation namenode slave
juju add-relation resourcemgr slave
juju add-relation resourcemgr namenode
juju add-relation plugin resourcemgr
juju add-relation plugin namenode

Deploy Flume hdfs:

juju deploy apache-flume-hdfs flume-hdfs
juju add-relation flume-hdfs plugin

Now that the base environment has been deployed (either via quickstart or manually), you are ready to add the apache-flume-twitter charm and relate it to the flume-hdfs agent:

juju deploy apache-flume-twitter flume-twitter --config=secret.yaml
juju add-relation flume-twitter flume-hdfs

Make sure you name the service flume-twitter so that it matches the first line of secret.yaml.

That's it! Once the Flume agents start, tweets will start flowing into HDFS via the flume-twitter and flume-hdfs charms. Flume may include multiple events in each file written to HDFS. This is configurable with various options on the flume-hdfs charm. See descriptions of the roll_* options on the apache-flume-hdfs charm store page for more details.

Flume will write files to HDFS in the following location: /user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id>. The <event_dir> subdirectory is configurable and set to flume-twitter by default for this charm.

Test the deployment

To verify this charm is working as intended, SSH to the flume-hdfs unit and locate an event:

juju ssh flume-hdfs/0
hdfs dfs -ls /user/flume/<event_dir>               # <-- find a date
hdfs dfs -ls /user/flume/<event_dir>/<yyyy-mm-dd>  # <-- find an event

Since our tweets are serialized in avro format, you'll need to copy the file locally and use the dfs -text command to view it:

hdfs dfs -copyToLocal /user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id>.avro /home/ubuntu/myFile.txt
hdfs dfs -text file:///home/ubuntu/myFile.txt

You may not recognize the body of the tweet if it's not in a language you understand (remember, this is a 1% firehose from tweets all over the world). You may have to try a few different events before you find a tweet worth reading. Happy hunting!

Contact Information

Help

layer-apache-flume-twitter's People

Contributors

johnsca avatar ktsakalozos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.