GithubHelp home page GithubHelp logo

awslabs / dynamodb-streams-kinesis-adapter Goto Github PK

View Code? Open in Web Editor NEW
97.0 30.0 36.0 366 KB

The Amazon DynamoDB Streams Adapter implements the Amazon Kinesis interface so that your application can use KCL to consume and process data from a DynamoDB stream.

License: Apache License 2.0

Java 100.00%

dynamodb-streams-kinesis-adapter's Introduction

DynamoDB Streams Kinesis Adapter for Java

DynamoDB Streams Kinesis Adapter implements the Amazon Kinesis interface so that your application can use KCL to consume and process data from a DynamoDB stream. You can get started in minutes using Maven.

Features

  • The DynamoDB Streams Kinesis Adapter for Amazon Kinesis Client Library (KCL) is the best way to ingest and process data records from DynamoDB Streams.
  • The KCL is designed to process streams from Amazon Kinesis, but by adding the DynamoDB Streams Kinesis Adapter, your application can process DynamoDB Streams instead, seamlessly and efficiently.

Release Notes

Latest Release (v1.6.0)

  • Upgrades Amazon Kinesis Client Library (KCL) to version 1.14.9. Customers can now use DynamoDB Streams Adapter with KCL version 1.14.9. However, DynamoDB Streams Adapter does not inherit performance optimizations like support for child shards, shard synchronization, deferred lease clean-up available in KCL.
  • Fixes the bug which was causing errors in DynamoDB Streams Adapter with KCL version 1.14.0.
  • With upgrade to KCL version 1.14.9, the default shard prioritization strategy has been changed to NoOpShardPrioritization. To retain the existing behavior, DynamoDB Streams customers should explicitly update the shard prioritization strategy to ParentsFirstShardPrioritization if there was no explicit override done in the application.
  • Upgrades jackson-databind to version 2.12.7.1
  • This release uses Apache 2.0 license.

Release (v1.5.4)

  • Upgrades AWS Java SDK to version 1.12.130
  • Upgrades jackson-databind to version 2.12.6.1
  • Fixes logging in DynamoDBStreamsShardSyncer to log only the problematic shardId instead of logging all the shardIds

Release (v1.5.3)

  • Upgrades jackson-databind to version 2.9.10.7
  • Upgrades junit to version 4.13.1
  • Upgrades AWS Java SDK to version 1.11.1016

Release (v1.5.2)

  • Upgrades jackson-databind to version 2.9.10.5
  • Updates StreamsWorkerFactory to use KinesisClientLibConfiguration billing mode when constructing KinesisClientLeaseManager.

Release (v1.5.1)

  • Restores compile compatibility with KCL 1.13.3.
  • Fixes a performance issue that arised when using v1.5.0 with KCL 1.12 through 1.13.2.
  • Fixes a defect where MaxLeasesForWorker configuration was not being propagated to StreamsLeaseTaker.
  • Finished (SHARD_END) leases will now only be delete after at least 6 hours have passed since the shard was created. This further reduces the chances of lineage replay.

Release (v1.5.0)

  • Introduces the implementation of periodic shard sync in conjunction with Amazon Kinesis Client Library v1.11.x (KCL). The default shard sync strategy is to discover new/child shards only when a consumer completes processing a shard. This default strategy constrains horizontal scaling of customer applications when consuming tables with 10,000+ partitions due to increased DescribeStream calls. Periodic shard sync guarantees that only a subset of the fleet (by default 10) will perform shard syncs, and decouples DescribeStream call volume from growth in fleet size.

  • Improves inconsistency handling in DescribeStream result aggregation by fixing any parent-open-child-open cases. This ensures that shard sync does not fail due to an assertion failure in KCL on this type of inconsistency.

  • Modifies finished shard lease cleanup mechanism. Leases for shards that have been completely processed are now deleted only after all their children shards have been completely processed. This will prevent shard lineage replay issues, instances of which have been reported in the past by some customers.

  • Introduces StreamsLeaseTaker with improved load-balancing of leases among workers.

    • SHARD_END and non-SHARD_END check-pointed leases are balanced independently.
    • Leases are now stolen evenly from other workers instead of from only the most loaded worker. MaxLeasesToStealAtOneTime no longer needs to be specified by users. It is now determined automatically based on the number of leases held by the worker. The user-specified value for this is no longer used.
  • Users should continue using factory methods from StreamsWorkerFactory to create KCL Worker as specified in the guidance of Release v1.4.x.

  • We strongly recommended that you create only one worker per host in your processing fleet to get optimal performance from DynamoDB Streams service.

Release (v1.4.x)

  • This release fixes an issue of high propagation delay of streams records when processing streams on small tables. This issue occurs when KCL ShardSyncer is not discovering new shards due to server side delays in shard creation or in reporting new shard creation to internal services. The code is implemented in a new implementation of IKinesisProxy interface called DynamoDBStreamsProxy which is part of the latest release.
  • This release requires Kinesis Client Library version >= 1.8.10. Version 1.8.10 has changes to allow IKinesisProxy injection into the KCL Worker builder which is required by DynamoDB Streams Kinesis Adapter v1.4.x for injection of DynamoDBStreamsProxy into the KCL worker during initialization. Please refer to Kinesis Client Library release notes for 1.8.10 for more information.
  • Suggested AWS Java SDK version >= 1.11.218
  • It is highly recommended to configure Kinesis Client Library with MaxRecords = 1000 and IdleTimeInMillis = 500 to optimize DynamoDB Streams costs.

Guidance for injecting DynamoDBStreamsProxy into KCL worker when using DynamoDB Streams Kinesis Adapter v1.4.x.

To fix high propagation delay problems, opt-into using DynamoDBStreamsProxy (instead of the default KinesisProxy) by using the StreamsWorkerFactory factory method (shown below). This injects an instance of DynamoDBStreamsProxy into the created KCL worker.

       final Worker worker = StreamsWorkerFactory
           .createDynamoDbStreamsWorker(
               recordProcessorFactory,
               workerConfig,
               adapterClient,
               amazonDynamoDB,
               amazonCloudWatchClient);

Getting Started

  1. Sign up for AWS - Before you begin, you need an AWS account. Please see the AWS Account and Credentials section of the developer guide for information about how to create an AWS account and retrieve your AWS credentials. You don’t need this if you’re using DynamoDB Local.
  2. Minimum requirements - To run the SDK you will need Java 1.8+. For more information about the requirements and optimum settings for the SDK, please see the Java Development Environment section of the developer guide.
  3. Install the DynamoDB Streams Kinesis Adapter - Using Maven is the recommended way to install the DynamoDB Streams Kinesis Adapter and its dependencies, including the AWS SDK for Java. To download the code from GitHub, simply clone the repository by typing: git clone https://github.com/awslabs/dynamodb-streams-kinesis-adapter.git, and run the Maven command described below in "Building From Source". You may also depend on the maven artifact com.amazonaws:dynamodb-streams-kinesis-adapter.
  4. Build your first application - There is a walkthrough to help you build first application using this adapter. Please see Using the DynamoDB Streams Kinesis Adapter to Process Stream Records.

Including as a Maven dependency

Add the following to your Maven pom file:

<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>dynamodb-streams-kinesis-adapter</artifactId>
    <version>1.5.1</version>
</dependency>

Building From Source

Once you check out the code from GitHub, you can build it using Maven: mvn clean install

dynamodb-streams-kinesis-adapter's People

Contributors

afitzgibbon avatar aggarwal avatar amcp avatar csan6 avatar dependabot[bot] avatar gguptp avatar hyandell avatar jamesiri avatar johnmshields avatar leiye avatar parijatsinha avatar schwar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynamodb-streams-kinesis-adapter's Issues

What is the cost for using DynamoDB Stream Kinesis Adapter + Kinesis Client Library to read DDB Stream

Quote from DDB Pricing for On-Demand Capacity:
DynamoDB charges for reading data from DynamoDB Streams in read request units. Each GetRecords API call is billed as a streams read request unit and returns up to 1 MB of data from DynamoDB Streams. Streams read request units are unique from read requests on your DynamoDB table. You are not charged for GetRecords API calls invoked by AWS Lambda as part of DynamoDB triggers. You also are not charged for GetRecords API calls invoked by DynamoDB global tables.

It seems like 1 GetRecords API = 1 read request unit. Since after switching to adapter and KCL library, there is no GetRecords API call at all. How can we know the number of read units we make? Can you share more details how to calculate the cost on README?

Should have a builder

There should be a builder for AmazonDynamoDBStreamsAdapterClients that returns an instance as a generic AmazonKinesis object.

wrong aws java sdk version specified in pom.xml

pom.xml currently specifies version 1.11.218 for various aws sdks:

<aws-java-sdk.version>1.11.218</aws-java-sdk.version>

However, it also specifies use of 1.9.0 for the KCL, which makes use of kinesis.model.ListShardsRequest, which appears to have been added in 1.11.272. As a result, this error occurs in programs using dynamodb-streams-kinesis-adapter 1.4.0 if they only install the dynamo streams adapter and its dependencies without separately installing KCL 1.9.0:

SEVERE: Caught throwable while processing data.
java.lang.NoClassDefFoundError: com/amazonaws/services/kinesis/model/ListShardsRequest
	at com.amazonaws.services.kinesis.clientlibrary.proxies.KinesisProxy.listShards(KinesisProxy.java:291)
	at com.amazonaws.services.kinesis.clientlibrary.proxies.KinesisProxy.getShardList(KinesisProxy.java:365)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.getShardList(ShardSyncer.java:319)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.syncShardLeases(ShardSyncer.java:121)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.checkAndCreateLeasesForNewShards(ShardSyncer.java:90)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask.call(ShardSyncTask.java:71)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.initialize(Worker.java:504)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.run(Worker.java:436)
	at com.amazonaws.services.kinesis.multilang.MultiLangDaemon.call(MultiLangDaemon.java:114)
	at com.amazonaws.services.kinesis.multilang.MultiLangDaemon.call(MultiLangDaemon.java:61)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.kinesis.model.ListShardsRequest
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 15 more

dynamodb-streams-kinesis-adapter broken on newest kcl 1.14.0 version

See awslabs/amazon-kinesis-client#746 for more background:

Hello - yesterday I upgraded to 1.14.0 kcl client for our application that uses dynamodb streams for processing. Since then I've noticed these very consistent errors. we've seen 10s of thousands of these in just a few hours, and repeated for the same shard ids.

ERROR [2020-10-20 17:07:45,185] [RecordProcessor-0015] c.a.s.k.c.lib.worker.ProcessTask: ShardId shardId-00000001603151190143-090943ac: Caught exception: 
com.amazonaws.SdkClientException: Shard shardId-00000001603151190143-090943ac: GetRecordsResult is not valid. NextShardIterator: null. ChildShards: []
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisDataFetcher$AdvancingResult.accept(KinesisDataFetcher.java:126)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.SynchronousGetRecordsRetrievalStrategy.getRecords(SynchronousGetRecordsRetrievalStrategy.java:31)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.BlockingGetRecordsCache.getNextResult(BlockingGetRecordsCache.java:50)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.getRecordsResultAndRecordMillisBehindLatest(ProcessTask.java:377)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.getRecordsResult(ProcessTask.java:342)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.call(ProcessTask.java:159)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

As best I can tell, a GetRecordsResult with a null NextShardIterator and no child shards is a valid response - in fact there is no field specified for child shards at all here:
docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_GetRecords.html

Using kcl 1.14.0, and creating a worker using dynamodb-streams-kinesis-adapter 1.5.2. The worker is setup using this method: https://github.com/awslabs/dynamodb-streams-kinesis-adapter/blob/master/src/main/java/com/amazonaws/services/dynamodbv2/streamsadapter/StreamsWorkerFactory.java#L44

I am not setting any special configuration other than the following, which I believe shouldn't be relevant.

 KinesisClientLibConfiguration config =  new KinesisClientLibConfiguration(applicationName,
                                                                                  streamName,
                                                                                  credentialsProvider,
                                                                                  workerId)
                        .withInitialPositionInStream(initialPositionInStream)
                        .withMaxRecords(kclConfiguration.getMaxRecordsToFetch())
                        .withTaskBackoffTimeMillis(kclConfiguration.getBackOffTimeMillis())
                        .withIdleTimeBetweenReadsInMillis(kclConfiguration.getIdleTimeBetweenReads());

The code do not compile

KinesisClientLeaseManager(java.lang.String,com.amazonaws.services.dynamodbv2.AmazonDynamoDB)

KinesisClientLeaseManager constructor is changed to and it now needs BillingMode in an amazon-kinesis-client-1.13.1 jar
Since the code is changed in the existing version, everywhere a code change is required.

UnsupportedClassVersionError for Java 7 runtime

Getting below error

[2016-07-08 00:38:25.352 EDT] o.s.b.SpringApplication - ERROR: Application startup failed
java.lang.UnsupportedClassVersionError: com/amazonaws/services/kinesis/clientlibrary/interfaces/IRecordProcessorFactory : Unsupported major.minor version 52.0 (unable to load class com.amazonaws.services.kinesis.clientlibrary.interfaces.IRecordProcessorFactory)
at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:3111) ~[catalina.jar:7.0.64]
at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:1348) ~[catalina.jar:7.0.64]
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1828) ~[catalina.jar:7.0.64]
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1709) ~[catalina.jar:7.0.64]

Root Cause: amazon-kinesis-client 1.6.4 version is released on July 6th, 2016 and looks like it is compiled with JDK 1.8

In pom.xml , am not sure why you are defining version range for amazon-kinesis-client instead of depending on a fixed version. This is causing unexpected errors like this all of a sudden
<amazon-kinesis-client.version>[1.6.0, 1.7.0)</amazon-kinesis-client.version>

Worker goes idle forever

In one of our applications, we have observed that DynamoDB Streams processing sometimes stops until application is restarted. The first time it happened it caused quite a headache, as we discovered it more than 24 hours later (some data was no longer available in the stream). Now, with monitoring in place, we can see it happens every few days (happened 4 times so far). We have observed the following:

  • It starts idling after reaching SHARD_END (not every time though). RecordProcessor is shut down with status TERMINATE and no new RecordProcessor is created. ShutdownTask does not report CreateLeases metrics, which it usually does.
  • When idling, there is no RecordProcessor thread and worker repeatedly logs that it has No activities assigned. We can see in lease table that there is only one shard with checkpoint at SHARD_END. When refreshing the table, we can see that leaseCounter gets incremented. The TakeLeases and RenewAllLeases operations keep successfully running (by successfully I mean it reports success in metrics). LeaseTaker sees no new shards to take.
  • After restart new shards are added to the lease table with checkpoint at TRIM_HORIZON, one is child of the shard with checkpoint at SHARD_END and parent of the other shard with TRIM_HORIZON checkpoint. The application resumes processing where it left off (or at oldest available data).

Checking KCL library implementation, we have noticed that LeaseTaker will take new leases only if these are available in the lease table. Discovering and inserting new leases to lease table happens only on 2 occasions: on worker initialization and on reaching shard end. We suspect that sometimes when shard end is reached and shards are listed, information about new shards is not yet available. Because of that, no new shards are inserted into lease table and so LeaseTaker will not see the new shards. As no shard is being consumed, no shard end is reached, no shards are ever inserted to lease table, and so the worker stays idle forever. Given there is more than one worker instance, the problem is probably less visible, since shards will be synced again when another worker finishes its shard, unlocking the idle worker. Nevertheless, there will be a period where worker is idle because shards are not in sync in lease table.

I am not sure, whether this issue belongs to KCL library or the DynamoDB Adapter. It seems KCL is working under assumption, that information about new shards is always available before shard end is reached. I don't know whether this assumption is intentional and violated by the Adapter, or whether the assumption is wrong and has to be fixed in KCL. Therefore I created this issue in both projects. The same issue in the other project: awslabs/amazon-kinesis-client#442

Libraries used:

  • com.amazonaws:dynamodb-streams-kinesis-adapter:1.4.0
  • com.amazonaws:amazon-kinesis-client:1.9.0

Question: Is the AmazonDynamoDBStreamsAdapterClient threadsafe?

I have an application that consumes multiple DynamoDB streams from different tables. I'd like to reuse stateless clients where I can.

Is it acceptable to create different KCL workers that use the same AmazonDynamoDBStreamsAdapterClient instance, where each worker is consuming a different table stream?

Vulnerabilities found in com.fasterxml.jackson.core:jackson-databind

Hello,
I'm working to solve different vulnerabilities in my current project. This project is using this library to work with stream reading. The problem I have is because you're using com.fasterxml.jackson.core:jackson-databind:2.9.10.7 and that version has a lot of vulnerabilities. You can see them in the following link: https://mvnrepository.com/artifact/com.amazonaws/dynamodb-streams-kinesis-adapter/1.5.3
I would like to know if you're going to update your last version of this library or if you have a new library I can use as an alternative or something like that.
Thanks!

amazon-kinesis-client 1.13.1 breaks dynamodb-streams-kinesis-adapter

The newly released version of amazon-kinesis-client (1.3.1) is incompatible with 1.5.0 version of dynamodb-streams-kinesis-adapter.

This could be fixed by removing an open-ended dependency on amazon-kinesis-client.

java.lang.NoSuchMethodError: com.amazonaws.services.kinesis.leases.impl.KinesisClientLeaseManager.<init>(Ljava/lang/String;Lcom/amazonaws/services/dynamodbv2/AmazonDynamoDB;)V

Build compatible with client-lib 1.7.3?

Would it be possible to get a build of this project that is compatible with kinesis-client 1.73 (AWS SDK 1.11.76)?

I am working on integrating the DDB stream adapter into Spark so that spark streaming apps can read from a DDB stream. They are currently using SDK version 1.11.76 and kinesis-client 1.7.3 (I think mostly due to constraints of being compatible with Hadoop). The ddb-streams 1.1.1 build is compatible from 1.6.4 to 1.7.0 and the 1.2.0 build from 1.7.5 to 1.8.0, but no builds are compatible with 1.7.3. It would really help me out if there was a build available that was compatible in the 1.70-1.7.5 range.

Thanks!

building from sources error

After running Maven build command I got the error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project dynamodb-streams-kinesis-adapter: Fatal error compiling: invalid target release: 1.7 -> [Help 1]

I am on Mac OS 10.10.2. Installed Maven 3.2.5 with homebrew. Java version "1.8.0_25"

amazon-kinesis-client 1.13.3 breaks dynamodb-streams-kinesis-adapter

Hi, this issue pretty similar to #27, newly released version of amazon-kinesis-client, namely 1.13.3 introduced compatibility issue with dynamodb-streams-kinesis-adapter.

Incompatibility caused by adding a new method to the IKinesisProxy interface (link)

It leads to the following error:

com.amazonaws.services.dynamodbv2.streamsadapter.DynamoDBStreamsProxy does not define or inherit an implementation of the resolved method 'abstract com.amazonaws.services.kinesis.clientlibrary.proxies.ShardClosureVerificationResponse verifyShardClosure(java.lang.String)' of interface com.amazonaws.services.kinesis.clientlibrary.proxies.IKinesisProxy.
\tat com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownTask.call(ShutdownTask.java:108)
...

As @tmszdmsk already mentioned, as a temporary solution it can be fixed by changing version range to the 1.13.2

Add support for the latest KCL versions (>1.7.1)

Using this library in conjunction with KCL 1.7.3 is not possible currently. Since the signature and package of several classes
e.g.
com.amazonaws.services.kinesis.clientlibrary.types.ShutdownReason
moved to
com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownReason
In my project, I use the latest KCL and have to exclude the KCL included in dynamodb-streams-kinesis-adapter, this causes compilation errors to happen.

[Multilang] Handle sigterm shutdown

updated release for latest KCL

The KCL is at version 1.9.0, but the public release of this adapter still specifies 1.7.5/1.7.6. Can a new release of this adapter be produced targeting the latest KCL?

Dynamodb Streams Kinesis Adapter for Python

  1. Is there any Existing things for processing dynamodb streams with KCL in python ?

  2. I came to know as of now, only java Adapter is Available, if it's true how can we do process dynamodb streams in python

    TIA

Is there a version of the adapter for KCL v2.x

As per title. I am looking at consuming dynamodb stream via KCL and ended up here.
I need to use KCL v2.x, but this library seems to be dealing with KCL v1.x.
Is there an equivalent implementation for KCL v2.x?

DynamoDB Streams lag monitoring

Hello!
We're using DynamoDB Streams + Kinesis Client Library (KCL).
How can we measure latency between event was created in a stream and it was processed on KCL side?

As I know, KCL's MillisBehindLatest metric is specific to Kinesis Streams.
approximateCreationDateTime record attribute has a minute-level approximation, which is not acceptable for monitoring in sub-second latency systems.
Could you please help with some useful metrics for monitoringDynamoDB Streams latency?

Thank you!

Ivan

Cannot find the shard given the shardId - Non stop log spam

I was having issues as per #20 for the longest time. Finally with the latest updates to this library and KCL 1.9.2 I don't seem to be having stuck streams.

However, I am constantly seeing these logs spamming at a warning level:

2018-10-23 17:41:56 WARN c.a.s.d.s.DynamoDBStreamsProxy - Cannot find the shard given the shardId shardId-xxxxx
2018-10-23 17:41:56 WARN c.a.s.k.c.lib.worker.ProcessTask - Cannot get the shard for this ProcessTask, so duplicate KPL user records in the event of resharding will not be dropped during deaggregation of Amazon Kinesis records.

I've looked up some issues in the KCL library but nothing providing a solid answer to my situation there. It seems somehow related to the way the DynamoDB streams proxy works as I wasn't seeing this log spam on the KCL side until I updated to the latest version of this code and started using the new construction method.

I've check the leases in DynamoDB I've often seen only one lease entry for the shard being complained about. It's a very simple setup right now, based on the sample code. Two worker threads in a single process right now processing the streams and usually only one shard.

I've seen:

  • one thread with a valid shard assigment, one thread with no assignments,
  • one thread with an invalid (cannot find, spamming), one with no assignments
  • one thread with a valid assignment, one thread with an invalid assignment (cannot find, spamming)

Suppress some INFO and WARNING log

Hi,

There are a few INFO log entries below, and they are repeatedly printed:

INFO: getShardList: begin
INFO: getShardList: done

INFO: syncShardLeases: begin
INFO: syncShardLeases: done

INFO: cleanupLeasesOfFinishedShards: begin
INFO: cleanupLeasesOfFinishedShards: done

INFO: cleanupGarbageLeases: begin
INFO: cleanupGarbageLeases: done

INFO: determineNewLeasesToCreate: begin
INFO: determineNewLeasesToCreate: done

I tried to suppress them in my logback settings but it doesn't seem to work

<logger name="com.amazonaws.services.dynamodbv2.streamsadapter" level="WARN" />

Any suggestion how to suppress them?

One more thing is that there are a couple of WARNING log entries during startup,
and I think they sound like INFO level in stead of WARNING:

WARNING: Received configuration for region as eu-west-1.

Regards,
Tien

question about shard split

I still have a question about the shard split. If one partition split into two, will the shard also be split? E.g., there are 4 shards in the partition1, then, partition1 split into partition2 and partition3, my question is will the shards also be split so that there are 4 shards on each new partition? or the old 4 shards on the partition won't be split, only the new stream data will be split into partition1/shard0 and partition2/shard0?

Issues copying the project dependencies through maven due to DynamoDBLocal's last release being published in a repository other than the specified in the POM

dynamodb-streams-kinesis-adapter has included as a test-only dependency in its POM DynamoDBLocal. I have noticed at work, as we use this project, that a system of ours could not be built any longer because the version constraint the kinesis adapter has for the test dependency is that of [1.12,2.0), and the latest valid 1.XX.X release (1.24.0, as per the Release history for DynamoDB local site says) has not been uploaded to the repository specified in the POM for the dependency:

<repositories>
<repository>
<id>dynamodblocal</id>
<name>AWS DynamoDB Local Release Repository</name>
<url>https://s3-us-west-2.amazonaws.com/dynamodb-local/release</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>

This can be replicated running the following snippet:

$ mkdir /tmp/jars
$ curl -L -o /tmp/jars/dynamodb-streams-kinesis-adapter-1.5.3.pom http://search.maven.org/remotecontent?filepath=com/amazonaws/dynamodb-streams-kinesis-adapter/1.5.3/dynamodb-streams-kinesis-adapter-1.5.3.pom
$ mvn -B -f /tmp/jars/dynamodb-streams-kinesis-adapter-1.5.3.pom dependency:copy-dependencies -DoutputDirectory=/tmp/jars -DincludeScope=runtime

The issue is still present with the latest release of the kinesis adapter.

Not able to get this to work with assumeRole credentials

Perhaps this is by design, but the shard sync background thread traps all exceptions and issues log warnings, but does not provide any mechanism for handling errors or bubble the exceptions up to any calling code. We have a situation where we need the same process to talk to dynamodb endpoints in multiple AWS accounts, so the dynamodb client we pass to the streams adapter uses a token generated by sts AssumeRoleRequest. However, those credentials expire after a time and the streams kinesis adapter just goes into an infinite loop of logging error messages.

We tried creating a custom AwsCredentialsProvider that will detect the expired credentials and issue a new AssumeRoleRequest whenever getCredentials() is called, but it seems that that code is never called by the background thread (we see only 1 logging instance of it being called when we first initialize the stream).

We're using version 1.5.1 of this project with version 1.11.828 of the AWS SDK.

Is there some way to force it to refresh the credentials or at a minimum bubble up the exception so we can trap it and restart the stream with new credentials?

Stacktrace (if helpful):

com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The security token included in the request is expired (Service: AmazonDynamoDBStreams; Status Code: 400; Error Code: ExpiredTokenException; Request ID: G762UUV7S4ASDEQISM6U4CET2VVV4KQNSO5AEMVJF66Q9ASUAAJG)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
       at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
       at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
       at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
       at com.amazonaws.services.dynamodbv2.AmazonDynamoDBStreamsClient.doInvoke(AmazonDynamoDBStreamsClient.java:686)
       at com.amazonaws.services.dynamodbv2.AmazonDynamoDBStreamsClient.invoke(AmazonDynamoDBStreamsClient.java:653)
       at com.amazonaws.services.dynamodbv2.AmazonDynamoDBStreamsClient.invoke(AmazonDynamoDBStreamsClient.java:642)
       at com.amazonaws.services.dynamodbv2.AmazonDynamoDBStreamsClient.executeDescribeStream(AmazonDynamoDBStreamsClient.java:361)
       at com.amazonaws.services.dynamodbv2.AmazonDynamoDBStreamsClient.describeStream(AmazonDynamoDBStreamsClient.java:332)
       at com.amazonaws.services.dynamodbv2.streamsadapter.AmazonDynamoDBStreamsAdapterClient.describeStream(AmazonDynamoDBStreamsAdapterClient.java:250)
       at com.amazonaws.services.dynamodbv2.streamsadapter.DynamoDBStreamsProxy.getStreamInfo(DynamoDBStreamsProxy.java:166)
       at com.amazonaws.services.dynamodbv2.streamsadapter.DynamoDBStreamsProxy.buildShardGraphSnapshot(DynamoDBStreamsProxy.java:279)
       at com.amazonaws.services.dynamodbv2.streamsadapter.DynamoDBStreamsProxy.getShardList(DynamoDBStreamsProxy.java:221)
       at com.amazonaws.services.dynamodbv2.streamsadapter.DynamoDBStreamsShardSyncer.getShardList(DynamoDBStreamsShardSyncer.java:320)
       at com.amazonaws.services.dynamodbv2.streamsadapter.DynamoDBStreamsShardSyncer.syncShardLeases(DynamoDBStreamsShardSyncer.java:124)
       at com.amazonaws.services.dynamodbv2.streamsadapter.DynamoDBStreamsShardSyncer.checkAndCreateLeasesForNewShards(DynamoDBStreamsShardSyncer.java:100)
       at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.checkAndCreateLeasesForNewShards(ShardSyncer.java:41)
       at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask.call(ShardSyncTask.java:84)
       at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
       at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTaskManager.lambda$checkAndSubmitNextTask$0(ShardSyncTaskManager.java:119)

KCL is unable to renew the lease due to an issue calling renewLeases

There is an internal server error during the lifetime of a table: a DependencyException when DynamoDB update fails in an unexpected way - indicating that the KCL is unable to renew the lease due to an issue calling renewLeases on the application's lease table (kinesis library having issue talking to DynamoDB).

The worker tries to renewLeases and throws a log error: Encountered and exception while renewing a lease (it increases a lease to 1).

"message":"LeasingException encountered in lease renewing thread”
"stack_trace":"com.amazonaws.services.kinesis.leases.exceptions.DependencyException: Encountered an exception while renewing leases.."

amazon-kinesis-client:1.9.1
aws-java-sdk-dynamodb:1.11.382
dynamodb-streams-kinesis-adapter:1.4.0

Any idea how to fix it?
Thanks

Backwards-incompatible change in 1.6.0 causes NullPointerException

Hi, we have some code that invokes StreamsWorkerFactory.createDynamoDbStreamsWorker like so:

StreamsWorkerFactory.createDynamoDbStreamsWorker(
      /* recordProcessorFactory = */
      transactionAuthRequestsStreamsRecordProcessorFactory,
      /* config = */
      kinesisClientLibConfiguration,
      /* streamsClient = */
      amazonDynamoDBStreamsAdapterClient,
      /* dynamoDBClient = */
      dynamoDBClient,
      /* cloudWatchClient = */
      null
)

In version 1.5.3 of this lib, this worked fine. In 1.6.0 it throws an NullPointerException with this at the top of the stack:

java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "cloudWatchClient" is null
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.getMetricsFactory(Worker.java:1241)
	at com.amazonaws.services.dynamodbv2.streamsadapter.StreamsWorkerFactory.createDynamoDbStreamsWorker(StreamsWorkerFactory.java:127)

It appears that passing a null cloudWatchClient is no longer allowed, is this intentional?

Only one worker get records when multiple Workers on same stream

We have a listener worker connecting to a dynamodb table stream to read item insert events. All works fine with 1.5.1 version. But the moment I start another JVM with the same listener worker, only the first one keep getting records/events. In the secondly started listener, I can see following log. So my question is, given this works with one kinesis adaptor client and second consumer not receiving events, how can we ensure each KCL client pointing to same stream get a copy the event. Looks like only one get it ?

INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - No activities assigned
INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Sleeping ...

Add `listShards` to `AmazonDynamoDBStreamsAdapterClient`

AmazonDynamoDBStreamsAdapterClient is being used by https://github.com/spring-projects/spring-integration-aws to switch from DynamoDB Streams to Kinesis Stream models. The problem is one of the operations being used is ListShards. DynamoDB Stream doesn't support this so that's broken at the moment in that project: spring-projects/spring-integration-aws#181. It is using ListShards instead of DynamoDB streams because of this issue: spring-cloud/spring-cloud-stream-binder-aws-kinesis#134, where some people were having problems with the DescribeStream operation limits.

Would you consider adding support for ListShards operation by delegating that to DescribeStream API in case of DynamoDB stream? If so let me know and I can do a PR like this one: spring-cloud/spring-cloud-stream-binder-aws-kinesis#144 to this repo instead.

Overwrites Endpoint Set

I've been trying to use this with DynamoDB Local. I set the endpoint to my localhost and the correct port but the KCL keeps hitting us-west-2.

I tried throwing an immutable DynamoDB Stream client into AmazonDynamoDBStreamsAdapterClient. This causes the immutable client error (can't set).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.