GithubHelp home page GithubHelp logo

awslabs / amazon-kinesis-client-ruby Goto Github PK

View Code? Open in Web Editor NEW
145.0 30.0 57.0 107 KB

A Ruby interface for the Amazon Kinesis Client Library. Allows developers to easily create robust application to process Amazon Kinesis streams in Ruby.

License: Apache License 2.0

Ruby 100.00%

amazon-kinesis-client-ruby's Introduction

Amazon Kinesis Client Library for Ruby

This package provides an interface to the Amazon Kinesis Client Library's (KCL) MultiLangDaemon for the Ruby language. Developers can use the Amazon KCL to build distributed applications that process streaming data reliably at scale. The Amazon KCL takes care of many of the complex tasks associated with distributed computing, such as load-balancing across multiple instances, responding to instance failures, checkpointing processed records, and reacting to changes in stream volume. This package wraps and manages the interaction with the MultiLangDaemon which is part of the Amazon KCL for Java so that developers can focus on implementing their record processor executable. A record processor in Ruby typically looks something like:

#! /usr/bin/env ruby

require 'aws/kclrb'

class SampleRecordProcessor < Aws::KCLrb::V2::RecordProcessorBase
  def init_processor(initialize_input)
    # initialize
  end

  def process_records(process_records_input)
    # process batch of records
  end

  def lease_lost(lease_lost_input)
    # lease was lost, cleanup
  end

  def shard_ended(shard_ended_input)
    # shard has ended, cleanup
  end

  def shutdown_requested(shutdown_requested_input)
    # shutdown has been requested
  end
end

if __FILE__ == $0
  # Start the main processing loop
  record_processor = SampleRecordProcessor.new
  driver = Aws::KCLrb::KCLProcess.new(record_processor)
  driver.run
end

Before You Get Started

Before running the samples, you'll want to make sure that your environment is configured to allow the samples to use your AWS Security Credentials.

By default the samples use the DefaultAWSCredentialsProviderChain so you'll want to make your credentials available to one of the credentials providers in that provider chain. There are several ways to do this such as providing a ~/.aws/credentials file, or if you're running on Amazon EC2, you can associate an IAM role with your instance with appropriate access.

For questions regarding Amazon Kinesis Service and the client libraries please check the official documentation as well as the Amazon Kinesis Forums.

Running the Sample

Using the Amazon KCL for Ruby package requires the MultiLangDaemon which is provided by the Amazon KCL for Java. Rake tasks are provided to start the sample application(s) and download all the required dependencies.

The sample application consists of two components:

  • A data producer (samples/sample_kcl_producer.rb): this script creates an Amazon Kinesis stream and starts putting random records into it.
  • A data processor (samples/sample_kcl.rb): this script is invoked by the MultiLangDaemon and consumes the data from the Amazon Kinesis stream and stores it into files (1 file per shard).

The following defaults are used in the sample application:

  • Stream name: kclrbsample
  • Region: us-east-1
  • Number of shards: 2
  • Amazon KCL application name: RubyKCLSample
  • Amazon DynamoDB table for KCL application: RubyKCLSample
  • Amazon CloudWatch metrics namespace for KCL application: RubyKCLSample

Running the Data Producer

To run the data producer, run the following commands:

    cd samples
    rake run_producer

Notes

  • The AWS Ruby SDK gem for Kinesis needs to be installed as a pre-requisite. To install, run:

        sudo gem install aws-sdk-kinesis
  • The script samples/sample_kcl_producer.rb takes several parameters that you can use to customize its behavior. To see the available options, run:

        samples/sample_kcl_producer.rb --help

Running the Data Processor

To run the data processor, run the following commands:

    cd samples
    rake run properties_file=sample.properties

Notes

  • The JAVA_HOME environment variable needs to point to a valid JVM.

  • The rake task invokes the MultiLangDaemon passing to it the properties file samples/sample.properties. This file contains the information needed to bootstrap the sample application, e.g.

    • executableName = samples/sample_kcl.rb
    • streamName = kclrbsample
    • applicationName = RubyKCLSample
    • regionName = us-east-1

Cleaning Up

This sample application creates a real Amazon Kinesis stream and sends real data to it, and create a real DynamoDB table to track the Amazon KCL application state, thus potentially incurring AWS costs. Once done, you can log in to AWS management console and delete these resources. Specifically, the sample application will create in your default AWS region

  • an Amazon Kinesis Data Stream named kclrbsample
  • an Amazon DynamoDB table named RubyKCLSample

Running on Amazon EC2

Running on Amazon EC2 is simple. Assuming you are already logged into an Amazon EC2 instance running Amazon Linux, the following steps will prepare your environment for running the sample application. Note the version of Java that ships with Amazon Linux can be found at /usr/bin/java and should be 1.7 or greater.

    # install some prerequisites if missing
    sudo yum install gcc patch git ruby rake rubygems ruby-devel
    # install the AWS Ruby SDK (pre-requisuite for producer)
    sudo gem install aws-sdk aws-kclrb
    # clone the git repository to work with the samples
    git clone https://github.com/awslabs/amazon-kinesis-client-ruby.git kclrb
    # run the sample
    cd kclrb/samples
    rake run_producer
    # ... and in another terminal
    rake run properties_file=sample.properties

Under the Hood - What You Should Know about Amazon KCL's MultiLangDaemon

Amazon KCL for Ruby uses Amazon KCL for Java internally. We have implemented a Java-based daemon, called the MultiLangDaemon that does all the heavy lifting. Our approach has the daemon spawn the user-defined record processor script/program as a sub-process. The MultiLangDaemon communicates with this sub-process over standard input/output using a simple protocol, and therefore the record processor script/program can be written in any language.

At runtime, there will always be a one-to-one correspondence between a record processor, a child process, and an Amazon Kinesis Shard. The MultiLangDaemon will make sure of that, without any need for the developer to intervene.

In this release, we have abstracted these implementation details away and exposed an interface that enables you to focus on writing record processing logic in Ruby. This approach enables Amazon KCL to be language agnostic, while providing identical features and similar parallel processing model across all languages.

See Also

Release Notes

Release 2.1.1 (February 21, 2023)

  • #69 Include pom.xml in the gemspec

Release 2.1.0 (January 12, 2023)

Release 2.0.0 (February 26, 2019)

  • Added support for Enhanced Fan-Out.
    Enhanced Fan-Out provides dedicated throughput per stream consumer, and uses an HTTP/2 push API (SubscribeToShard) to deliver records with lower latency.
  • Updated the Amazon Kinesis Client Library for Java to version 2.1.2.
  • Added version 2 of the RecordProcessorBase which supports the new ShardRecordProcessor interface
    • The shutdown method from version 1 has been replaced by lease_lost and shard_ended.
    • Added the lease_lost method which is invoked when a lease is lost.
      lease_lost replaces shutdown(checkpointer, 'ZOMBIE').
    • Added the shard_ended method which is invoked when all records from a split or merge have been processed.
      shard_ended replaces shutdown(checkpointer, 'TERMINATE').
    • Added an optional method, shutdown_requested, which provides the record processor a last chance to checkpoint during the Amazon Kinesis Client Library shutdown process before the lease is canceled.
      • To control how long the Amazon Kinesis Client Library waits for the record processors to complete shutdown, add timeoutInSeconds=<seconds to wait> to your properties file.
  • Updated the AWS Java SDK version to 2.4.0
  • MultiLangDaemon now provides logging using Logback.
    • MultiLangDaemon supports custom configurations for logging via a Logback XML configuration file.
    • The example Rakefile supports setting the logging configuration by adding log_configuration=<log configuration file> to the Rake command line.

Release 1.0.1 (January 19, 2017)

Release 1.0.0 (December 30, 2014)

  • aws-kclrb gem which exposes an interface to allow implementation of record processors in Ruby using the Amazon KCL's MultiLangDaemon
  • samples directory contains a sample producer and processing applications using the Amazon KCL for Ruby library.

License

This library is licensed under the Apache 2.0 License.

amazon-kinesis-client-ruby's People

Contributors

brendan-p-lynch avatar cory-bradshaw avatar davidor avatar dependabot[bot] avatar hyandell avatar jpeddicord avatar leifg avatar lucienlu-aws avatar manango avatar pfifer avatar rmahfoud avatar sahilpalvia avatar thesnicketylemon avatar witoff avatar zengyu714 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-kinesis-client-ruby's Issues

--shards option is not working correctly in sample producer

The --shards option of the sample producer is not working as expected.

When the number of shards is specified like this for example: "./sample_kcl_producer.rb --stream 'myStream' --shards 1", instead of creating or using the stream 'myStream', the program creates a stream called "1", or uses it if it exists already.

Streams are always created with the default number of shards (2), because the option --shards does not work.

This happens because a bug in this line of code: https://github.com/awslabs/amazon-kinesis-client-ruby/blob/master/samples/sample_kcl_producer.rb#L114

timeout option is not working in sample producer

The sample producer crashes when the timeout option is used.

This is the error I get: sample_kcl_producer.rb:124:in block (2 levels) in <main>': undefined local variable or methods' for main:Object (NameError).

Usage question regarding the pom file

The pom.xml file is not packaged in the gemspec.

I was expecting to pull that from the gem and install the required dependancies from there for my environments.

Is that an oversight or should I copy the pom file to my application and hard lock the version of this gem that I depend on?

run producer doesn't work

when i run rake run_producer I get

..
.rvm/rubies/ruby-2.1.3/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- aws/kinesis (LoadError)
    from /Users/asdf/.rvm/rubies/ruby-2.1.3/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in `require'
    from /Users/asdf/workspace/expeditions/amazon-kinesis-client-ruby/samples/sample_kcl_producer.rb:16:in `<main>'

do note, instead of installign via

sudo gem install aws-sdk

I am using RVM and I did a

gem install aws-sdk

Upgrade Kinesis client

Are there any plans for upgrading the core kinesis library version to the latest?

We are looking to use idleTimeBetweenReadsInMillis, but it seems it is ignored till version 2.3.3

Supported Native Ruby KCLRB

howdy all, we love the idea of using kclrb but this unsupported java/ruby frankenstein is becoming quite painful for us. there hasn't been a meaningful commit here in nearly a year, there are no tests, new kinesis features are not available and the java code cannot be easily instrumented, monitored or updated. if anyone is inclined to publish a fault tolerant, distributed at-least once processing library natively in ruby we will happily support the development of this. please get in touch directly if interested.

Library broken? RSpec tests some are failing.

Need to know if this library still works. I just tried to run the RSpec tests and I noticed that some are failing (see logs below).

Does anyone know any sample app that works using this library? Let me know. Thank you.

.......FF

Failures:

  1) Aws::KCLrb::KCLProcess#run should respond to each action by invoking the corresponding processor's method and write a status message to the output IO
     Failure/Error: if processor.version == 1
       #<Double Aws::KCLrb::RecordProcessorBase> received unexpected message :version with (no args)
     # ./lib/aws/kclrb/kcl_process.rb:28:in `initialize'
     # ./spec/kcl_process_spec.rb:50:in `new'
     # ./spec/kcl_process_spec.rb:50:in `block (3 levels) in <module:KCLrb>'

  2) Aws::KCLrb::KCLProcess#run should process a normal stream of actions and produce expected output
     Failure/Error: raise MalformedAction.new("Action '#{action}': #{ke.message}")
     
     Aws::KCLrb::MalformedAction:
       Action '{"action"=>"initialize", "shardId"=>"shardId-123"}': key not found: "sequenceNumber"
     # ./lib/aws/kclrb/kcl_process.rb:82:in `rescue in process_action'
     # ./lib/aws/kclrb/kcl_process.rb:59:in `process_action'
     # ./lib/aws/kclrb/kcl_process.rb:41:in `run'
     # ./spec/kcl_process_spec.rb:84:in `block (3 levels) in <module:KCLrb>'
     # ------------------
     # --- Caused by: ---
     # KeyError:
     #   key not found: "sequenceNumber"
     #   ./lib/aws/kclrb/kcl_process.rb:65:in `fetch'

Finished in 0.01498 seconds (files took 0.13833 seconds to load)
9 examples, 2 failures

Failed examples:

rspec ./spec/kcl_process_spec.rb:37 # Aws::KCLrb::KCLProcess#run should respond to each action by invoking the corresponding processor's method and write a status message to the output IO
rspec ./spec/kcl_process_spec.rb:58 # Aws::KCLrb::KCLProcess#run should process a normal stream of actions and produce expected output

Struggling to get example consumer working

I've been trying to get the example consumer working correctly our multiple AWS accounts. I've tried with both the Node.JS client and now the ruby client, both throw the same error after a few minutes:

java.util.concurrent.ExecutionException: java.lang.RuntimeException: Reached end of STDIN of child process for shard shardId-000000000000 so won't be able to return a message.

This is the default code and nothing has been changed but the connection/stream details. Anyone else having issues getting the samples to run?

Community contributions?

Do you accept community contributions? If so, are there guidelines?

I have started using this, and we have had the need to subclass and overwrite a private method in KCLProcess for catchall error reporting. I was thinking of writing a PR to upstream a hook to allow for a callable error handler globally.

Would you accept such a contribution?

How does one opt-out of enhanced fan out?

I scanned the example properties file and it didn't seem clear... if I want to use this with the old GetRecords behavior instead of SubscribeToShard, how does one configure the kcl to do that?

Native Ruby KCL

Hello,

Are there plans on developing native Ruby Kinesis Client Library? Having the Java library running Ruby code when you have a simple Rails application is very far from optimal. It makes Docker instances heavy and requires lots of supporting code.

Thanks,
Henadzi

GracefulShutdown fails

Tried to run on both my application and sample from this repo. Every time I'm trying to stop the application, I see next message:

2019-04-04 10:36:23,894 [RequestedShutdownThread] INFO  s.a.k.c.GracefulShutdownCoordinator$GracefulShutdownCallable [NONE] - Shutdown completed, but shutdownCompleteLatch still had outstanding 2 with a current value of 2. shutdownComplete: false -- Consumer Map: 0

I suppose application's stopping properly, but KCL still waits for something. Is it OK?

aws-sdk 2.0

How can we make this work with the aws-sdk 2.0 gem? These don't seem to be compatible yet. Any help appreciated.

Thank you

KCL not resilient to network issues

We frequently see the below error that takes down our nodes: The kcl should be more resilient on a box that's meant to process a firehose. What's worse, is that our error handling & instrumentation does not pick up java errors -- we'd really like to support a native ruby implementation.

at java.lang.Thread.run(Thread.java:745)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:23)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:48)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.call(ProcessTask.java:96)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.getRecords(ProcessTask.java:186)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisDataFetcher.getRecords(KinesisDataFetcher.java:69)
at com.amazonaws.services.kinesis.clientlibrary.proxies.MetricsCollectingKinesisProxyDecorator.get(MetricsCollectingKinesisProxyDecorator.java:72)
at com.amazonaws.services.kinesis.clientlibrary.proxies.KinesisProxy.get(KinesisProxy.java:147)
at com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:592)
at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2130)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:245)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:417)
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:730)
at com.amazonaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:41)
at com.amazonaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:95)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:671)
at com.fasterxml.jackson.core.JsonFactory.createJsonParser(JsonFactory.java:831)
at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1182)
at com.fasterxml.jackson.core.JsonFactory._createJsonParser(JsonFactory.java:1191)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:226)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:129)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:505)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:918)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketTimeoutException: Read timed out
INFO: Unable to execute HTTP request: Read timed out

KCL 2.0 support?

KCL version 2.0 has been released. Any plans for updating other clients (and this one) to support it? I'm so interested in the enhanced fan-out feature.

Is this abandonware?

PRs are not being responded to, and there have not been any updates in 8 months. If we are evaluating streaming solutions for our Ruby/Rails application is it fair to take this as a negative signal?

Set log level from command line

Is it possible to add documentation on how to set the log level of the Consumer? We'd like to log WARN and higher, we find that INFO is too verbose for us.

Request for clarification: should we use 1.0.2 or master?

๐Ÿ‘‹ hi there. We just started using this. Seems really great so far! We followed the examples that are on master, but then saw some logs related to shutdownRequested, and noticed that master is slightly behind the latest gem release (1.0.2), and the main new thing is that support has been added to handle that action.

For now, we're planning to use the code from master, and are pulling the gem in via git rather than rubygems. If master is stable, will you please consider releasing it at 1.0.3?

Thanks!

Diff: v1.0.1...03be111

example does not work in eu-central-1

if i change region to eu-central-1 i get

 Stream testme under account XXX not found. (Service: AmazonKinesis; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: 5cd33d33-f1cb-11e4-9fa7-e5838a497e54)

even though this stream in frankfurt exists. everything works fine if i use eu-west-1
might be a bug in amazon-kinesis-client

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.