GithubHelp home page GithubHelp logo

grayworm / ingestion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stratio/ingestion

0.0 1.0 0.0 22.6 MB

Flume - Ingestion, an Apache Flume distribution

Home Page: https://stratio.atlassian.net/wiki/display/PLATFORM/STRATIO+INGESTION

License: Apache License 2.0

Shell 5.32% Java 92.24% Batchfile 0.15% PowerShell 2.20% Gherkin 0.09%

ingestion's Introduction

Coverage Status

Stratio Ingestion

Contents

  • Introduction
  • Stratio Ingestion components
  • Details about Stratio Ingestion
  • Compile & Package
  • FAQ

Introduction

Stratio Ingestion started as a fork of Apache Flume (1.6), where you can find:

Custom sources and sinks, developed by Stratio

  • SNMP (v1, v2c and 3)
  • redis, Kafka (0.8.1.1)
  • MongoDB, JDBC, Cassandra and Druid
  • Stratio Decision (Complex Event Processing engine)
  • REST client, Flume agents stats

Several bug fixes

  • Some of them really important, such as unicode support

Several enhancements of Flume's sources & sinks

  • ElasticSearch mapper, for example

You can find more documentation about us here

Stratio Ingestion components

  • Data transporter and collector: Apache Flume
  • Data extractor and transformer: Morphlines
  • Custom sources types to read data from:
    • REST com.stratio.ingestion.source.rest.RestSource
    • Redis FlumeStats com.stratio.ingestion.source.redis.RedisSource
    • SNMPTraps com.stratio.ingestion.source.snmptraps.SNMPSource
    • IRC com.stratio.ingestion.source.irc.IRCSource
  • Custom sinks types to write the data to:
    • Cassandra com.stratio.ingestion.sink.cassandra.CassandraSink
    • MongoDB com.stratio.ingestion.sink.mongodb.MongoSink
    • Stratio Decision
    • JDBC com.stratio.ingestion.sink.jdbc.JDBCsink
    • Kafka com.stratio.ingestion.sink.kafka.KafkaSink
    • Druid com.stratio.ingestion.sink.druid.DruidSink

Details about Stratio Ingestion

Stratio Ingestion is based on Apache Flume so the first question is:

What is Apache Flume?

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

Its use is not only designed for logs, in fact you can find a myriad of sources, sinks and transformations.

In addition, a sink could be a big data storage but also another real-time system (Apache Kafka, Spark Streaming).

Interesting facts about Stratio Ingestion

  • Flume Ingestion is Apache Flume "on steroids" :)

  • We are extensively using Kite SDK (morphlines) in order to do a better T from ETL, and so we have also developed a bunch of custom transformations.

  • Stratio ingestion is fully open source and we work very close to the Flume community.

Compile & Package

$ mvn clean compile package -Ppackage

Distribution will be available at stratio-ingestion-dist/target/ folder. You will find .deb, .rpm and .tar.gz packages ready to use depending your environment. If you take a look at documentation you will find more details about how to install the product, and some useful examples to get a better understanding about Stratio Ingestion.

FAQ

Can I use Stratio Ingestion for aggregating data (time-based rollups, for example)?

*This is not a good idea from our experience, but you can use Stratio Sparkta for real-time aggregation.

Is Flume Ingestion multipersistence?

Yes, you can write data to JDBC sources, mongoDB, Apache Cassandra, ElasticSearch, Apache Kafka, among others.

Can I send data to decision-cep-engine?

Of course, we have developed a sink in order to send events from Flume to an existing stream in our CEP engine. The sink will create the stream if it does not exist in the engine.

Where can I find more details about Stratio Ingestion?

*You can take a look at our Documentation on Confluence

Changelog

See the changelog for changes.

ingestion's People

Contributors

smola avatar aargomaniz avatar mariomgal avatar gasparms avatar miguelseg avatar aaitor avatar eambrosio avatar becaresss avatar eruizgar avatar witokondoria avatar adoblas avatar ml0renz0 avatar dvallejo avatar compae avatar cmonteserin-stratio avatar carlosgarcia-stratio avatar carlos-verdes avatar pmadrid-stratio avatar imoreno-stratio avatar jgduarte-stratio avatar inavarroreus avatar pbedia-stratio avatar opuertas avatar

Watchers

Liu Gang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.