GithubHelp home page GithubHelp logo

dataengdev / amazon-kinesis-analytics-taxi-consumer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aws-samples/amazon-kinesis-analytics-taxi-consumer

0.0 1.0 0.0 11.11 MB

Sample Apache Flink application that can be deployed to Kinesis Analytics for Java. It reads taxi events from a Kinesis data stream, processes and aggregates them, and ingests the result to an Amazon Elasticsearch Service cluster for visualization with Kibana.

Home Page: https://aws.amazon.com/blogs/big-data/build-and-run-streaming-applications-with-apache-flink-and-amazon-kinesis-data-analytics-for-java-applications/

License: Apache License 2.0

Java 100.00%

amazon-kinesis-analytics-taxi-consumer's Introduction

Amazon Kinesis Analytics Taxi Consumer

Sample Apache Flink application that can be deployed to Kinesis Analytics for Java. It reads taxi events from a Kinesis data stream, processes and aggregates them, and ingests the result to an Amazon Elasticsearch Service cluster for visualization with Kibana.

To see the sample application in action, simply execute the following CloudFormation template in your own AWS account. The template first builds the Flink application that is analyzing the incoming taxi trips, including the Flink Kinesis Connector that is required to read data from a Kinesis data stream, and then creates the infrastructure and submits the Flink application to KDA for Java.

Launch CloudFormation Stack

The entire process of building the application and creating the infrastructure takes about 15 minutes. Once the creation of the CloudFormation stack completes, the Flink application has been deployed to KDA for Java as a KDA for Java application and waits for events in the data stream to arrive. Checkpointing has been enabled so that the application can seamlessly recover from failures of the underlying infrastructure while KDA for Java will manage the checkpoints on your behalf. In addition, autoscaling has been configured so that KDA for Java automatically allocates or removes resources and scales the application, ie, adapts its parallelism, in response to changes of the incoming traffic.

To populate the Kinesis data stream, we use a Java application that replays a public data set of historic taxi trips made in New York City into the data stream. The Java application has already been downloaded to an EC2 instance that has been provisioned by CloudFormation, you just need to connect to the Instance and execute the jar file to start ingesting events into the stream.

Note that all of the following commands, including their correct parameters, can be obtained from the output section of the CloudFormation template that has been executed previously.

$ ssh ec2-user@«Replay instance DNS name»

$ java -jar amazon-kinesis-replay-*.jar -streamName «Kinesis data stream name» -streamRegion «AWS region» -speedup 3600

The speedup parameter determines how much faster the data is ingested into the Kinesis data stream relative to the actual occurrence of the historic events. With the given parameters the Java application ingests an hour of historic data within one second, which results in a throughput of roughly 13k events and 6 MB of data per second and hence completely saturates the Kinesis data stream—more on this later.

You can then go ahead and inspect the derived data through the Kibana dashboard that has been created, or you can create your own visualizations to explore the data in Kibana.

https://«Elasticsearch endpoint»/_plugin/kibana/app/kibana#/dashboard/nyc-tlc-dashboard

The prepared Kibana dashboard contains a heatmap and a line graph. The heatmap visualizes locations where taxis are currently requested and it shows that the highest demand for taxis is Manhattan. Moreover, the airports JFK and LaGuardia are also spots on the map where substantially more rides are requested compared to their direct neighborhoods. The line graph visualizes the average trip duration to these two airports and you can see how it is steadily increasing throughout the day until it abruptly drops in the evening.

Kibana Dashboard Screen Shot

Note that the Elasticsearch cluster is configured to accept connections from the IP address range specified as a parameter of the CloudFormation template. Please note that for production workloads it’s much more desirable to further tighten the security of your Elasticsearch domain, for instance, by using Amazon Cognito for Kibana access control.

License

This library is licensed under the Apache 2.0 License.

amazon-kinesis-analytics-taxi-consumer's People

Contributors

sthm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.