Comments (4)
I've been looking at the source code for lease stealing. It appears to not give any time for the worker from which the lease was stolen to notice that the lease is gone and to stop processing before the new worker starts duplicating work and wreaking havoc on shared resources.
At the very least, after any leases are stolen (as opposed to adopted expired leases) , the new worker should wait at least LeaseTaker.leaseDurationNanos before attempting to process any records for that shard so that the old owner has an opportunity to know what's going on.
from amazon-kinesis-client.
The KCL provides at least once semantics. In cases such as fail over and load balancing, data can be delivered more than once, and there can be a (typically small) window of time when there are multiple record processors assigned to a shard. We recommend designing your application to be idempotent.
In your scenario, is it possible for your record processor to sleep (up to fail over time) as part of initialize() when you detect that the old record processor may still be using the resource?
I'd recommend starting a thread on our forum (https://forums.aws.amazon.com/forum.jspa?forumID=169) with some more details about your use case/application. We are better able to answer application design questions via the forums.
Sincerely,
Gaurav
from amazon-kinesis-client.
"At least once" delivery to a single instance of a record processor is one thing. That can easily be handled by checkpointing, etc. Having multiple active readers on the same data at the same time is entirely different. It requires having some sort of database interlock between all potential readers on any output operation that can't be undone.
Consider, for instance, a simple copier that reads from one kinesis stream and outputs to another. There is no way to make such an operation idempotent.
If there is only one stream reading from each input shard, it is straightforward with a second dynamo table to keep track of the last input sequence number that was written to the output stream. If the current input sequence number is less than the last sequence number, it must be a restart from checkpoint, and so the output can be discarded.
However, if there are multiple readers on the same shard of the input stream, this straightforward approach is not possible, as they will be competing with one another to write the same data to the output. Expensive inter instance critical sections need to be written to ensure that no other machines are able to check the last sequence number, write to the output stream, and update the last sequence number in a non-atomic way.
from amazon-kinesis-client.
Even with a single reader, you have to design for scenarios where the reader may have produced an output but then died before it was able to advance the checkpoint (or the second dynamo table you mention). Once you've designed for it, then you can typically use the same mechanism to handle the multiple readers scenario.
In the simple copier scenario you mention above, you'll want the consumer of your second stream to be idempotent as well. For example, if the copier makes a call to Kinesis to put a record and the call times out you'll want to retry (to avoid data loss). If your first call had succeeded, then the retry will put a duplicate record. You'll want to design the consumer to handle these. Once the consumer is idempotent, then you can rely on that idempotency mechanism to handle the multiple reader/copier scenario.
Sincerely,
Gaurav
from amazon-kinesis-client.
Related Issues (20)
- Problem in ts-tsbgservice-kinesis-incoming-emea-prod HOT 2
- Unexpected exception was thrown - IllegalStateException
- RetrievalConfig.initialPositionInStreamExtended deprecated HOT 1
- KCL continues to hold leases after unexpected shutdown due to Error
- Please update your schema-registry-serde library in order to solve CVE issue HOT 1
- Need more test cases, samples, documentation for StreamConfigs in case of Multi Streams with KCL2.x.
- Support for both polling and fanout retrieval mode for multi-stream consumer configuration HOT 1
- amazon-kinesis-client-pom using old awssdk.version HOT 2
- i.n.c.ChannelInitializer Failed to initialize a channel. Closing: [id: 0x9a4e56d6] java.lang.VerifyError: Bad return type HOT 2
- Lease table is not updated when new shards are added causing stale workers HOT 1
- Support artifacts with all third-party dependencies relocated HOT 1
- Consolidate metrics to a common name space, like /aws/kinesis-client HOT 2
- Uneven distribution of shards over the consumer application workers HOT 1
- Retrieving shard consumer's current lease's hash range key HOT 2
- KCL2 -Multi stream consumer - Configured streams can be in same account and cross account HOT 4
- graceful shutdown of MultiLangDaemon worker that is assigned for completed shards is always timeout HOT 3
- STS Endpoint HOT 1
- Change to PollingConfig maxRecords breaks compat
- OutOfMemory due to huge number of 'org.apache.http.impl.conn.PoolingHttpClientConnectionManager' instances referenced by 'idle-connection-reaper' thread HOT 1
- Am I degrading my app if use multiple KCL at the same time? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-kinesis-client.