GithubHelp home page GithubHelp logo

aserrallerios / kcl-akka-stream Goto Github PK

View Code? Open in Web Editor NEW
22.0 22.0 5.0 79 KB

Custom Akka Stream Sources and Flows to interact with Kinesis streams using Kineis Client Library

License: Other

Scala 89.79% Java 10.21%
deprecated

kcl-akka-stream's People

Contributors

aserrallerios avatar cattail avatar julianhowarth avatar kusamakura avatar sullis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

kcl-akka-stream's Issues

Upgrade to KCL 2.x

Are there any plans to upgrade the version of the KCL used? 2.x adds some additional features that we'd be interested in using.

We'll probably try to build a version that works anyway, but don't want to repeat work that's already being done.

Cannot import in gradle

I want to use this library in Java, and mentioning this in my gradle.build as
compile group: 'aserrallerios', name: 'kcl-akka-stream_2.12', version: '0.6'

but my build fails Could not resolve aserrallerios:kcl-akka-stream_2.12:0.6.

Any help ?

Record processor doesn't checkpoint when shard has terminated

I think the checkpointer should be called here, without arguments:
https://github.com/aserrallerios/kcl-akka-stream/blob/master/src/main/scala/aserralle/akka/stream/kcl/IRecordProcessor.scala#L56

As indicated in the docs, this tells the application that all records on the shard have been processed and it is safe to mark as such in DynamoDB.

For a split or merge operation, the KCL won't start processing the new shards until the processors for the original shards have called checkpoint to signal that all processing on the original shards is complete.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-implementation-app-java.html

The problem is that, in my case, a shard split caused the end of each shard to be reached. But when shutdown is called without a final checkpoint, KCL throws the following error and doesn't let the application progress:

Sep 27 20:48:38 ip-172-31-81-83 java[12490]: 20:48:38.826 ERROR [RecordProcessor-0057] c.a.s.k.c.lib.worker.ShutdownTask - Application exception.
Sep 27 20:48:38 ip-172-31-81-83 java[12490]: java.lang.IllegalArgumentException: Application didn't checkpoint at end of shard shardId-000000000130

Thanks for the help.

Error in README.md

Hi,

I just notice a small error in the code example:

val checkpointSettings = KinesisWorkerCheckpointSettings(100, 30 seconds)

KinesisWorkerSource(builder, workerSourceSettings)
  .via(KinesisWorker.checkpointRecordsFlow(checkpointSettings))
  .to(Sink.ignore)

KinesisWorkerSource(builder, workerSourceSettings).to(
  KinesisWorker.checkpointRecordsSink(checkpointSettings))

should be

val checkpointSettings = KinesisWorkerCheckpointSettings(100, 30 seconds)

KinesisWorkerSource(builder, workerSourceSettings)
  .via(KinesisWorkerSource.checkpointRecordsFlow(checkpointSettings))
  .to(Sink.ignore)

KinesisWorkerSource(builder, workerSourceSettings).to(
  KinesisWorkerSource.checkpointRecordsSink(checkpointSettings))

Take care and thanks for sharing your code

Pierre

Failure rather then backpressure when multiple shards

If consuming a Kinesis stream with multiple shards then multiple worker threads are started (1 per shard) each of which attempt to write to the underlying SourceQueue. If the downstream consumer is slow then the queue will backpressure by not completing the returned Future until it has capacity.

Unfortunately, when there are multiple threads writing to the SourceQueue, the first one to hit the full queue will block on the returned Future, however a subsequent one will immediately be returned a failed future and the flow will terminate.

The below test shows the issue:

In KinesisWorkerContext:

    // Additional processor to be used by a separate shard worker
    var recordProcessor2: v2.IRecordProcessor = _
    ...
    recordProcessor2 = x.createProcessor()

Then add a test

"KinesisWorker Source" must {
    "not drop messages in case of back-pressure with multiple shard workers" in new KinesisWorkerContext
                                                        with TestData {
      recordProcessor.initialize(initializationInput)
      recordProcessor2.initialize(initializationInput.withShardId("shard2"))

      for (i <- 1 to 5) { // 10 is a buffer size
        val record = org.mockito.Mockito.mock(classOf[Record])
        when(record.getSequenceNumber).thenReturn(i.toString)
        recordProcessor.processRecords(recordsInput.withRecords(List(record).asJava))
        recordProcessor2.processRecords(recordsInput.withRecords(List(record).asJava))
      }

      //expect to consume all 10 across both shards
      for (_ <- 1 to 10) sinkProbe.requestNext()

      // Each shard is assigned its own worker thread, so we get messages
      // from each thread simultaneously.
      def simulateWorkerThread(rp: v2.IRecordProcessor): Future[Unit] = {
        Future {
          for (i <- 1 to 25) { // 10 is a buffer size
            val record = org.mockito.Mockito.mock(classOf[Record])
            when(record.getSequenceNumber).thenReturn(i.toString)
            rp.processRecords(recordsInput.withRecords(List(record).asJava))
          }
        }
      }

      //send another batch to exceed the queue size - this is shard 1
      simulateWorkerThread(recordProcessor)

      //send another batch to exceed the queue size - this is shard 2
      simulateWorkerThread(recordProcessor2)

      //expect to consume all 25 with slow consumer
      for (_ <- 1 to 25) {
        sinkProbe.requestNext()
        Thread.sleep(100)
      }

      killSwitch.shutdown()
      sinkProbe.expectComplete()
    }
}

Trapping the exception you can see:

java.lang.IllegalStateException: You have to wait for previous offer to be resolved to send another request

I think rather than all workers sharing a single SourceQueue, they will each need their own, then use a MergeHub to combine into a single stream of records. However, it may be simpler to achieve with a custom GraphStage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.