GithubHelp home page GithubHelp logo

After shard splitting, our log is flooded with warning messages "Cannot find the shard given shardId" about amazon-kinesis-client HOT 20 OPEN

awslabs avatar awslabs commented on September 26, 2024 8
After shard splitting, our log is flooded with warning messages "Cannot find the shard given shardId"

from amazon-kinesis-client.

Comments (20)

pfifer avatar pfifer commented on September 26, 2024 14

Just letting people know that we are aware of this. We're looking into fixing this, but I don't have an ETA at this time.

from amazon-kinesis-client.

shawnsmith avatar shawnsmith commented on September 26, 2024 12

I ran into this using DynamoDB Streams without explicit shard splitting occurring (just the usual DynamoDB cycling of the shard as @matthewbogner described). FWIW, here is the sequence we encountered that triggered the warnings. With DynamoDB Streams this occurs pretty often--at any given point in time there's almost always at least one of our servers in this state where it's logging these warnings every 2 seconds. We've had to turn off WARN for KinesisProxy and ProcessTask.

Assume a DynamoDB stream with shard S1 and two stream workers A and B using the KCL (we aren't using the KPL):

  1. At the start, consumer A owns a lease on shard S1, consumer B is idle because no leases are available.
  2. At some point, DynamoDB closes shard S1 and creates a child shard S2 whose parent is S1.
  3. A reaches the end of S1.
  4. A syncs the shard set with the DynamoDB lease table, creating a new lease for S2.
  5. A obtains the lease for S2. It hasn't yet cleaned up the lease for S1.
  6. B wakes up, notices 2 leases in the lease table both owned by A (S1 and S2), and steals the S2 lease from A (code).
  7. A notices that S2 lease has been lost, becomes idle.
  8. B begins processing records in S2.
  9. B logs warnings because it did not execute a code path that would cause it to re-sync its cached list of shards to include S2 (code).
    • KinesisProxy initializes its cached shard list on startup
    • KinesisProxy cached shard list is refreshed upon reaching the end of a shard
    • KinesisProxy cached shard list is NOT refreshed on lease steal
  10. B continues to log warnings until it reaches the end of shard S2.
  11. .. at which point, A may steal the lease for the new S3 and begin logging warnings.

from amazon-kinesis-client.

adrian-baker avatar adrian-baker commented on September 26, 2024 12

From 2016:

Just letting people know that we are aware of this. We're looking into fixing this, but I don't have an ETA at this time.

Is this still the case?

from amazon-kinesis-client.

igracia avatar igracia commented on September 26, 2024 3

Thanks @joshua-kim! We have several consumers using the DynamoDB Streams Kinesis adapter on a shingle shard, and still getting this with the following versions

  • dynamodb-streams-kinesis-adapter 1.5.1
  • amazon-kinesis-client 1.13.3

Bumping those versions makes it all stop working, so we're stuck with them for the time being. Also, as per this issue in dynamodb-streams-kinesis-adapter, we can't use v2. Any suggestions would be appreciated!

from amazon-kinesis-client.

adrian-skybaker avatar adrian-skybaker commented on September 26, 2024 2

The KCL dev flow has been quite stable in the many years I've been using it.

  1. wire in the KCL library
  2. be surprised about how much boilerplate handling code is required, without much supporting documentation, particularly on how to handle errors safely
  3. be alarmed about sporadic, opaque but continual warnings logged in your production deployments
  4. spend time googling and pursuing old open github issues with unclear resolutions
  5. give up and set log level to ERROR and cross your fingers. Hopefully you're not dealing with a domain where data loss is a serious issue. Or switch to Lambda.

from amazon-kinesis-client.

xujiaxj avatar xujiaxj commented on September 26, 2024 1

@amanduggal we modified our logback setting to suppress the warning message

logger name="com.amazonaws.services.kinesis.clientlibrary.proxies.KinesisProxy" level="ERROR"

from amazon-kinesis-client.

adrian-baker avatar adrian-baker commented on September 26, 2024 1

Unsure why this is labelled an enhancement?

from amazon-kinesis-client.

klesniewski avatar klesniewski commented on September 26, 2024 1

Any updates on this? If I understood correctly from @shawnsmith's analysis, the solution is to refresh cached shard list on lease steal?

from amazon-kinesis-client.

mrhota avatar mrhota commented on September 26, 2024 1

Just copying this over from the linked issue. @pfifer Do you have any updates or insight here?

I think I have the same issue, although we also see non-stop ERROR logs like:

ERROR [2020-02-27 13:02:45,382] [RecordProcessor-2873] c.a.s.k.c.lib.worker.InitializeTask: Caught exception: 
com.amazonaws.services.kinesis.clientlibrary.exceptions.internal.KinesisClientLibIOException: Unable to fetch checkpoint for shardId shardId-00000001582460850801-53f6f94b
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibLeaseCoordinator.getCheckpointObject(KinesisClientLibLeaseCoordinator.java:286)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitializeTask.call(InitializeTask.java:82)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

And I found an AWS Dev forum link related to this issue here: https://forums.aws.amazon.com/thread.jspa?messageID=913872

from amazon-kinesis-client.

mrhota avatar mrhota commented on September 26, 2024 1

@pfifer any ideas? any updates? any anything?

from amazon-kinesis-client.

aldilaff avatar aldilaff commented on September 26, 2024 1

@pfifer any updates on this?

from amazon-kinesis-client.

dacevedo12 avatar dacevedo12 commented on September 26, 2024 1

Same problem 6 years later 😞

I'm using amazon-kinesis-client 1.13.3 with dynamodb-streams-kinesis-adapter 1.5.3
This is especially annoying in combination with the already spammy MultilangDaemon

from amazon-kinesis-client.

aakavalevich avatar aakavalevich commented on September 26, 2024

I have the same warning. Does anyone know how to fix it?

from amazon-kinesis-client.

amanduggal avatar amanduggal commented on September 26, 2024

We are facing similar issue. Any advice on a solution would be appreciated?

@xujiaxj just curious if you thought of anything since the bug filing?

from amazon-kinesis-client.

matthewbogner avatar matthewbogner commented on September 26, 2024

This is especially annoying when using the KCL to read a dynamodb stream, which claims to split it's shards every 4 hours according to this blog post by one of the DynamoDB engineers at AWS:

Typically, shards in DynamoDB streams close for writes roughly every four hours after they are created and become completely unavailable 24 hours after they are created.

https://blogs.aws.amazon.com/bigdata/post/TxFCI3UJJJYEXJ/Process-Large-DynamoDB-Streams-Using-Multiple-Amazon-Kinesis-Client-Library-KCL

from amazon-kinesis-client.

ryanlewis avatar ryanlewis commented on September 26, 2024

Been testing the stack and looking at the sharding and been noticing these errors, although everything continues to appear to work.

Forgive my newness to the technology, but is this something that we should be concerned about?

from amazon-kinesis-client.

igracia avatar igracia commented on September 26, 2024

@aldilaff might want to check #185 too, in case the workerId is also bugging you.

from amazon-kinesis-client.

joshua-kim avatar joshua-kim commented on September 26, 2024

Cannot find the shard given the shardId

chgenvulgfjlejltgvglhecbucrihrcbbclfj

from amazon-kinesis-client.

igracia avatar igracia commented on September 26, 2024

@joshua-kim was that a yubikey press? :-P Otherwise, can you please elaborate on why the issue is being closed and how to solve/prevent it?

from amazon-kinesis-client.

joshua-kim avatar joshua-kim commented on September 26, 2024

@igracia Sorry, yes that was a Yubikey press. I was referencing this issue when looking into another cached shard map issue in a fork of 1.6; I'm curious though, are you still seeing this on the latest 2.x/1.x releases? The latest releases are no longer using ListShards in most cases, so I'm curious to see if this bug is still present.

from amazon-kinesis-client.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.