GithubHelp home page GithubHelp logo

sjoshid / hazelcast-jet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hazelcast/hazelcast-jet

1.0 1.0 0.0 19.25 MB

Distributed stream and batch processing engine, built on top of Hazelcast.

Home Page: http://jet.hazelcast.org

License: Apache License 2.0

Java 98.87% Shell 0.26% Batchfile 0.03% Ruby 0.01% HTML 0.02% CSS 0.81%

hazelcast-jet's Introduction

Hazelcast Jet

GitHub release Join the chat at https://gitter.im/hazelcast/hazelcast-jet


Hazelcast Jet is an open-source, cloud-native, distributed stream and batch processing engine.

Jet is simple to set up. The nodes you start discover each other and form a cluster automatically. You can do the same locally, even on the same machine (your laptop, for example). This is great for quick testing.

With Jet it's easy to build fault-tolerant and elastic data processing pipelines. Jet keeps processing data without loss even if a node fails, and you can add more nodes that immediately start sharing the computation load.

You can embed Jet as a part of your application, it's just a single JAR without dependencies. You can also deploy it standalone, as a stream-processing cluster.

Jet also provides a highly available, distributed in-memory data store. You can cache your reference data and enrich the event stream with it, store the results of a computation, or even store the input data you're about to process with Jet.


Start using Jet

Add this to your pom.xml to get the latest Jet as your project dependency:

<dependency>
    <groupId>com.hazelcast.jet</groupId>
    <artifactId>hazelcast-jet</artifactId>
    <version>3.2</version>
</dependency>

Since Jet is embeddable, this is all you need to start your first Jet instance! Read on for a quick example of your first Jet program.

Batch Processing with Jet

Use this code to start an instance of Jet and tell it to perform some computation:

String path = "books";

JetInstance jet = Jet.newJetInstance();

Pipeline p = Pipeline.create();

p.readFrom(Sources.files(path))
        .flatMap(line -> Traversers.traverseArray(line.toLowerCase().split("\\W+")))
        .filter(word -> !word.isEmpty())
        .groupingKey(word -> word)
        .aggregate(AggregateOperations.counting())
        .writeTo(Sinks.logger());

jet.newJob(p).join();

When you run this, point the path variable to some directory with text files in it. Jet will analyze all the files and give you the word frequency distribution in the log output (for each word it will say how many times it appears in the files).

The above was an example of processing data at rest (i.e., batch processing). It's conceptually simpler than stream processing so we used it as our first example.

Stream Processing with Jet

For stream processing you need a streaming data source. A simple example is watching a folder of text files for changes and processing each new appended line. Here's the code you can try out:

String path = "books";

JetInstance jet = Jet.newJetInstance();

Pipeline p = Pipeline.create();

p.readFrom(Sources.fileWatcher(path))
        .withIngestionTimestamps()
        .setLocalParallelism(1)
        .flatMap(line -> Traversers.traverseArray(line.toLowerCase().split("\\W+")))
        .filter(word -> !word.isEmpty())
        .groupingKey(word -> word)
        .window(WindowDefinition.tumbling(1000))
        .aggregate(AggregateOperations.counting())
        .writeTo(Sinks.logger());

jet.newJob(p).join();

Before running this make an empty directory and point the path variable to it. While the job is running copy some text files into it and Jet will process them right away.

Features:

  • Constant low latency - predictable latency is a design goal
  • Zero dependencies - single JAR which is embeddable (minimum JDK 8)
  • Cloud Native - with Docker images and Kubernetes support including Helm Charts.
  • Elastic - Jet can scale jobs up and down while running
  • Fault Tolerant - At-least-once and exactly-once processing guarantees
  • In-memory storage - Jet provides robust distributed in-memory storage for caching, enrichment or storing job results
  • Sources and sinks for Apache Kafka, Hadoop, Hazelcast IMDG, sockets, files
  • Dynamic node discovery for both on-premise and cloud deployments.

Distribution

You can download the distribution package which includes command-line tools from jet.hazelcast.org.

Documentation

See the Hazelcast Jet Reference Manual.

Code Samples

See examples folder for some examples.

Additional Connectors

See hazelcast-jet-contrib repository for community supported connectors and tools.

Architecture

See the architecture and performance pages for more details about Jet's internals and design.

Start Developing Hazelcast Jet

Use Latest Snapshot Release

You can always use the latest snapshot release if you want to try the features currently under development.

Maven snippet:

<repositories>
    <repository>
        <id>snapshot-repository</id>
        <name>Maven2 Snapshot Repository</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
        </snapshots>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>com.hazelcast.jet</groupId>
        <artifactId>hazelcast-jet</artifactId>
        <version>4.0-SNAPSHOT</version>
    </dependency>
</dependencies>

Build From Source

Requirements

  • JDK 8 or later

To build on Linux/MacOS X use:

./mvnw clean package -DskipTests

for Windows use:

mvnw clean package -DskipTests

Contributions

We encourage pull requests and process them promptly.

To contribute:

Community

Hazelcast Jet team actively answers questions on Stack Overflow.

You are also encouraged to join the hazelcast-jet mailing list if you are interested in community discussions

License

Hazelcast Jet is available under the Apache 2 License. Please see the Licensing section for more information.

Copyright

Copyright (c) 2008-2019, Hazelcast, Inc. All Rights Reserved.

Visit www.hazelcast.com for more info.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.