GithubHelp home page GithubHelp logo

kaiwaehner / kafka-streams-machine-learning-examples Goto Github PK

View Code? Open in Web Editor NEW
834.0 61.0 303.0 52.76 MB

This project contains examples which demonstrate how to deploy analytic models to mission-critical, scalable production environments leveraging Apache Kafka and its Streams API. Models are built with Python, H2O, TensorFlow, Keras, DeepLearning4 and other technologies.

License: Apache License 2.0

Java 99.99% Python 0.01%
kafka kafka-streams kafka-client machine-learning deep-learning open-source h2o h2oai tensorflow deeplearning4j keras keras-tensorflow dl4j ksql java python

kafka-streams-machine-learning-examples's Introduction

Machine Learning + Kafka Streams Examples

This project contains examples which demonstrate how to deploy analytic models to mission-critical, scalable production leveraging Apache Kafka and its Streams API. Examples will include analytic models built with TensorFlow, Keras, H2O, Python, DeepLearning4J and other technologies.

Kafka Open Source Ecosystem for a Scalable Mission Critical Machine Learning Infrastructure

Material (Blogs Posts, Slides, Videos)

Here is some material about this topic if you want to read and listen to the theory instead of just doing hands-on:

Use Cases and Technologies

The following examples are already available including unit tests:
  • Deployment of a H2O GBM model to a Kafka Streams application for prediction of flight delays
  • Deployment of a H2O Deep Learning model to a Kafka Streams application for prediction of flight delays
  • Deployment of a pre-built TensorFlow CNN model for image recognition
  • Deployment of a DL4J model to predict the species of Iris flowers
  • Deployment of a Keras model (trained with TensorFlow backend) using the Import Model API from DeepLearning4J

More sophisticated use cases around Kafka Streams and other technologies will be added over time in this or related Github project. Some ideas:

  • Image Recognition with H2O and TensorFlow (to show the difference of using H2O instead of using just low level TensorFlow APIs)
  • Anomaly Detection with Autoencoders leveraging DeepLearning4J.
  • Cross Selling and Customer Churn Detection using classical Machine Learning algorithms but also Deep Learning
  • Stateful Stream Processing to combine different model execution steps into a more powerful workflow instead of "just" inferencing single events (a good example might be a streaming process with sliding or session windows).
  • Keras to build different models with Python, TensorFlow, Theano and other Deep Learning frameworks under the hood + Kafka Streams as generic Machine Learning infrastructure to deploy, execute and monitor these different models.
Some other Github projects exist already with more ML + Kafka content:

The most exciting and powerful example first: Streaming Machine Learning at Scale from 100000 IoT Devices with HiveMQ, Apache Kafka and TensorFLow

Here some more demos:

Requirements, Installation and Usage

The code is developed and tested on Mac and Linux operating systems. As Kafka does not support and work well on Windows, this is not tested at all.

Java 8 and Maven 3 are required. Maven will download all required dependencies.

Just download the project and run

            mvn clean package

You can do this in main directory or each module separately.

Apache Kafka 2.5 is currently used. The code is also compatible with Kafka and Kafka Streams 1.1 and 2.x.

Please make sure to run the Maven build without any changes first. If it works without errors, you can change library versions, Java version, etc. and see if it still works or if you need to adjust code.

Every examples includes an implementation and an unit test. The examples are very simple and lightweight. No further configuration is needed to build and run it. Though, for this reason, the generated models are also included (and increase the download size of the project).

The unit tests use some Kafka helper classes like EmbeddedSingleNodeKafkaCluster in package com.github.megachucky.kafka.streams.machinelearning.test.utils so that you can run it without any other configuration or Kafka setup. If you want to run an implementation of a main class in package com.github.megachucky.kafka.streams.machinelearning, you need to start a Kafka cluster (with at least one Zookeeper and one Kafka broker running) and also create the required topics. So check out the unit tests first.

Example 1 - Gradient Boosting with H2O.ai for Prediction of Flight Delays

Detailed info in h2o-gbm

Example 2 - Convolutional Neural Network (CNN) with TensorFlow for Image Recognition

Detailed info in tensorflow-image-recognition

Example 3 - Iris Prediction using a Neural Network with DeepLearning4J (DL4J)

Detailed info in dl4j-deeplearning-iris

Example 4 - Python + Keras + TensorFlow + DeepLearning4j

Detailed info in tensorflow-kerasm

kafka-streams-machine-learning-examples's People

Contributors

ardlema avatar jukkakarvanen avatar kaiwaehner avatar msilb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kafka-streams-machine-learning-examples's Issues

Class static used to pass info between stream steps

Kafka_Streams_TensorFlow_Image_Recognition_Example use class static variables to pass information between foreach and mapValues steps.

So I expect rather interesting problems if executed with multiple threads.

I have fix for this, but it is on top of changes related to #11

Could not load the H20 GBM

It shown this error massage when I use java command.

Error: Could not find or load main class com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_MachineLearning_H2O_GBM_Example

java -cp target/kafka-streams-machine-learning-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_MachineLearning_H2O_GBM_Example

Error on maven test

[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 22.589 s
[INFO] Finished at: 2018-11-22T10:12:38Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.20:test (default-test) on project kafka-streams-machine-learning-examples: There are test failures.
[ERROR]
[ERROR] Please refer to /data/home/linuxadmin/project/kafka-streams-machine-learning-examples/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
[ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /data/home/linuxadmin/project/kafka-streams-machine-learning-examples && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar /data/home/linuxadmin/project/kafka-streams-machine-learning-examples/target/surefire/surefirebooter8594516186798539983.jar /data/home/linuxadmin/project/kafka-streams-machine-learning-examples/target/surefire 2018-11-22T10-12-37_631-jvmRun1 surefire3166622089630056442tmp surefire_01187852086155324446tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /data/home/linuxadmin/project/kafka-streams-machine-learning-examples && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar /data/home/linuxadmin/project/kafka-streams-machine-learning-examples/target/surefire/surefirebooter8594516186798539983.jar /data/home/linuxadmin/project/kafka-streams-machine-learning-examples/target/surefire 2018-11-22T10-12-37_631-jvmRun1 surefire3166622089630056442tmp surefire_01187852086155324446tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:679)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:533)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:279)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:243)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1077)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:907)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:785)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:956)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:192)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

visualization- prediction output

hi.
great software build :)!
Would there be an easy way to attach visualization to the prediction output?
Many thanks,
Best,
Andrew

TopologyTestDriver based unit tests

Current unit tests contains copy of actual implementation, not testing actual code in src folder.

There is example how to utilize TopologyTestDriver and actually testing actual implementation:
https://github.com/jukkakarvanen/kafka-streams-machine-learning-examples/pull/1/files

This is not done as pull request because the implementation is done on top Open pull request:
#10

This same changes could be moved on top of current branch without module split.

I can add similar also for other class where there are actual implementation.
There are a couple of test where the actual implementation class is missing.

KAFKA install

Should we install KAFKA first and then run mvn clean package or directly run mvn clean package.

JUnit test not working in Windows

The JUnit test are failing in Windows due to https://issues.apache.org/jira/browse/KAFKA-6647

The same problem seems to be also mentioned here:
#4 (comment)

Readme also state: "Kafka does not support and work well on Windows, this is not tested at all."

I have used this kind of Workaround to get JUnit tests to work in Windows:
https://github.com/jukkakarvanen/kafka-streams-machine-learning-examples/tree/junit_in_windows

Not sure is this worth merging, but anyway if you want to run test in Windows you can use that branch.

NOTE: This is changed only in test code, not in actual classes.

Use `transform` instead of `foreach`

Hey Kai,

I think you can use transform that can return a new stream, with the similar function but return the result p as the value and even with a new key (right now it is null).

By doing this you also do not need to declare a static String airlineDelayPreduction :)

Error on timestamp

Thank you for prompt reply yesterday. Now, I got this error when I send the massage. Here is an error.

Possibly because a pre-0.10 producer client was used to write this record to Kafka without embedding a timestamp, or because the input topic was created before upgrading the Kafka cluster to 0.10+. Use a different TimestampExtractor to process this data.

Maven Dependency issue

I have developed H2o machine learning model and trying to deploy on confluent Kafka v 5.2.1 platform (localhost for testing). Intellij IDE using.

pom.xml


confluent
https://packages.confluent.io/maven/

    </repository>

</repositories>


<dependencies>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams</artifactId>
        <version>2.3.0-ccs</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>2.3.0-ccs</version>
    </dependency>

    <!-- For Scala developers -->
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams-scala_2.11</artifactId>
        <!-- or
        <artifactId>kafka-streams-scala_2.12</artifactId>
        -->
        <version>2.3.0-ccs</version>
    </dependency>

    <!-- Dependencies below are required/recommended only when using Apache Avro. -->
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>kafka-avro-serializer</artifactId>
        <version>5.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro</artifactId>
        <version>1.8.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro-maven-plugin</artifactId>
        <version>1.8.2</version>
    </dependency>
</dependencies>

copied from confluent website and paste in pom.xml

Getting following error during compile maven

[WARNING] The POM for org.apache.kafka:kafka-streams:jar:2.3.0-ccs is missing, no dependency information available
[WARNING] The POM for org.apache.kafka:kafka-clients:jar:2.3.0-ccs is missing, no dependency information available
[WARNING] The POM for org.apache.kafka:kafka-streams-scala_2.11:jar:2.3.0-ccs is missing, no dependency information available

Please suggest. I tried so many options but couldn't resolve. If possible please guide me.

Error executing example

Hi,

First, congrats for this examples.

I have a problem when trying to execute the example. This is the error:

Exception in thread "main" java.lang.ClassNotFoundException: com.github.megachucky.kafka.streams.machinelearning.modelss.gbm_pojo_test
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_MachineLearning_H2O_GBM_Example.main(Kafka_Streams_MachineLearning_H2O_GBM_Example.java:38)

Do I need to do something special with the POJO?

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.