aserrallerios / kcl-akka-stream Goto Github PK
View Code? Open in Web Editor NEWCustom Akka Stream Sources and Flows to interact with Kinesis streams using Kineis Client Library
License: Other
Custom Akka Stream Sources and Flows to interact with Kinesis streams using Kineis Client Library
License: Other
Are there any plans to upgrade the version of the KCL used? 2.x adds some additional features that we'd be interested in using.
We'll probably try to build a version that works anyway, but don't want to repeat work that's already being done.
@aserrallerios You have done a few PRs to enable KCL in alpakka kinesis according to https://github.com/akka/alpakka/commits/master/kinesis, do you recommend using alpakka kinesis or using this one?
Thank you.
I want to use this library in Java, and mentioning this in my gradle.build as
compile group: 'aserrallerios', name: 'kcl-akka-stream_2.12', version: '0.6'
but my build fails Could not resolve aserrallerios:kcl-akka-stream_2.12:0.6.
Any help ?
I think the checkpointer should be called here, without arguments:
https://github.com/aserrallerios/kcl-akka-stream/blob/master/src/main/scala/aserralle/akka/stream/kcl/IRecordProcessor.scala#L56
As indicated in the docs, this tells the application that all records on the shard have been processed and it is safe to mark as such in DynamoDB.
For a split or merge operation, the KCL won't start processing the new shards until the processors for the original shards have called checkpoint to signal that all processing on the original shards is complete.
https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-implementation-app-java.html
The problem is that, in my case, a shard split caused the end of each shard to be reached. But when shutdown
is called without a final checkpoint, KCL throws the following error and doesn't let the application progress:
Sep 27 20:48:38 ip-172-31-81-83 java[12490]: 20:48:38.826 ERROR [RecordProcessor-0057] c.a.s.k.c.lib.worker.ShutdownTask - Application exception.
Sep 27 20:48:38 ip-172-31-81-83 java[12490]: java.lang.IllegalArgumentException: Application didn't checkpoint at end of shard shardId-000000000130
Thanks for the help.
Is it possible to start publishing artifacts against scala 2.13?
Hi,
I just notice a small error in the code example:
val checkpointSettings = KinesisWorkerCheckpointSettings(100, 30 seconds)
KinesisWorkerSource(builder, workerSourceSettings)
.via(KinesisWorker.checkpointRecordsFlow(checkpointSettings))
.to(Sink.ignore)
KinesisWorkerSource(builder, workerSourceSettings).to(
KinesisWorker.checkpointRecordsSink(checkpointSettings))
should be
val checkpointSettings = KinesisWorkerCheckpointSettings(100, 30 seconds)
KinesisWorkerSource(builder, workerSourceSettings)
.via(KinesisWorkerSource.checkpointRecordsFlow(checkpointSettings))
.to(Sink.ignore)
KinesisWorkerSource(builder, workerSourceSettings).to(
KinesisWorkerSource.checkpointRecordsSink(checkpointSettings))
Take care and thanks for sharing your code
Pierre
If consuming a Kinesis stream with multiple shards then multiple worker threads are started (1 per shard) each of which attempt to write to the underlying SourceQueue
. If the downstream consumer is slow then the queue will backpressure by not completing the returned Future
until it has capacity.
Unfortunately, when there are multiple threads writing to the SourceQueue
, the first one to hit the full queue will block on the returned Future
, however a subsequent one will immediately be returned a failed future and the flow will terminate.
The below test shows the issue:
In KinesisWorkerContext
:
// Additional processor to be used by a separate shard worker
var recordProcessor2: v2.IRecordProcessor = _
...
recordProcessor2 = x.createProcessor()
Then add a test
"KinesisWorker Source" must {
"not drop messages in case of back-pressure with multiple shard workers" in new KinesisWorkerContext
with TestData {
recordProcessor.initialize(initializationInput)
recordProcessor2.initialize(initializationInput.withShardId("shard2"))
for (i <- 1 to 5) { // 10 is a buffer size
val record = org.mockito.Mockito.mock(classOf[Record])
when(record.getSequenceNumber).thenReturn(i.toString)
recordProcessor.processRecords(recordsInput.withRecords(List(record).asJava))
recordProcessor2.processRecords(recordsInput.withRecords(List(record).asJava))
}
//expect to consume all 10 across both shards
for (_ <- 1 to 10) sinkProbe.requestNext()
// Each shard is assigned its own worker thread, so we get messages
// from each thread simultaneously.
def simulateWorkerThread(rp: v2.IRecordProcessor): Future[Unit] = {
Future {
for (i <- 1 to 25) { // 10 is a buffer size
val record = org.mockito.Mockito.mock(classOf[Record])
when(record.getSequenceNumber).thenReturn(i.toString)
rp.processRecords(recordsInput.withRecords(List(record).asJava))
}
}
}
//send another batch to exceed the queue size - this is shard 1
simulateWorkerThread(recordProcessor)
//send another batch to exceed the queue size - this is shard 2
simulateWorkerThread(recordProcessor2)
//expect to consume all 25 with slow consumer
for (_ <- 1 to 25) {
sinkProbe.requestNext()
Thread.sleep(100)
}
killSwitch.shutdown()
sinkProbe.expectComplete()
}
}
Trapping the exception you can see:
java.lang.IllegalStateException: You have to wait for previous offer to be resolved to send another request
I think rather than all workers sharing a single SourceQueue
, they will each need their own, then use a MergeHub
to combine into a single stream of records. However, it may be simpler to achieve with a custom GraphStage
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.