Comments (5)
I was able to dig a little deeper and found that the reason these SCHEDULED task are not popped is because the RedisMock class's zAdd method for these tasks for the HTTP UNACK queue returns 0. That causes the task to not be popped in the _pop method of RedisDynoQueue. Can anyone explain what the unack queue is and how it is used? Is it used to see if the task was ack'ed?
For some tasks, returning 0 from the zAdd method does not cause issues as I guess further calls return a non-zero. But for these tasks a 0 is always returned.
from dyno-queues.
More info. Seems to be a bug with RedisSortedSetCache. It is trying to maintain two maps, the cache map and the scores map. The scores map value is a Map and the cache map value is a Set. When tasks are added to the UNACK.HTTP cache after they have already been added before, a duplicate entry is added to the cache map's Set but the scores entry is just overwritten (since the put does not allow dups). Then when the entry is removed, only one is removed from the cache, leaving one (or more) and the scores entry is deleted entirely.
The Comparator of the cache map does comparisons using the scores map, thinking that there are equivalent entries in both maps. If it finds two scores that are null it will return 0 (equal).
Since the SortedSet contains() method uses the comparator, even though the set does not contain the task, it thinks it does because of the null-scores comparison.
Bottom line on this is that this causes SCHEDULED tasks to be stranded forever and never popped. This is a big problem.
NOTE THAT this bug only happens when more than one HTTP system task is SCHEDULED in parallel and only sporadically. I think it is a race condition but I am not sure yet as this is very difficult to debug. I am also not sure if it only happens with HTTP tasks.
Here’s my debugging history:
In the Redis persistence implementation of conductor, the DynoQueueDAO uses RedisDynoQueue as its queue impl. RedisDynoQueue uses JedisMock as its JedisCommands impl. JedisMock uses RedisMock for its zadd and zrem impl if the params contain “nx”, which they do.
RedisMock uses the RedisSortedSetCache as its ‘zsetCache’. That class has two maps – cache and scores. The cache has a Comparator that doesn’t just compare the task ids, it also compares the scores in the scores map for the two task ids of the comparison.
It appears as though the cache entry that is causing the problem is the “conductor_queues.test.UNACK.HTTP.c” cache.
When the pop() method of DynoQueueDAO is called, it calls the pop() method of the “conductor_queues.test.UNACK.HTTP.c” RedisDynoQueue which calls its internal _pop() method. In its _pop() method, it does this for each task in the UNACK.HTTP queue:
long added = quorumConn.zadd(unackQueueName, unackScore, msgId, zParams);
if(added == 0){
monitor.misses.increment();
continue;
}
So if ‘added=0’, that task will not be added to the tasks to pop. In the zadd method of RedisMock, the contains() method of the cache map is used to see if the task exists in the cache. If it does, it returns 0. The contains() method uses the Comparator specified by RedisSortedSet.set() method which doesn’t just check the task id of the others in the set, it also will return a 0 (equals) if both scores for those tasks are null, which they are at that point. So it returns 0 because both scores are null and it is not popped.
Note that this only happens the first time zadd is called. After the contains() method is called to determine if the task is in the queue, it then puts it in the queue with the zsetCache.set() call so from then on it does actually exist and the contains() method will then always return 0. The task stays in there forever and never gets popped.
I am not sure the best way to fix this yet. I am looking for advice.
from dyno-queues.
If this is a Dyno Queues issue, I suggest you open an issue on https://github.com/netflix/dyno-queues so that the proper team can handle it.
from dyno-queues.
hi @rickfish
Is this problem solved?
I also encountered the same problem
conductor version 2.29.0
redis 5.0.5
If you solved it,
I want to know how to solve it, what should I do?
from dyno-queues.
Not sure if it was solved. Since we are now using Postgres as the db it only affects us if running on somebody's laptop using db=memory. In that case, it would just be a developer doing a test of his/her changes so this kind of problem would not be noticed. In any case, I think it is a DynoQueue issue with RedisSortedSetCache and would have to be resolved in that OpenSource project.
Sorry I could not help more.
from dyno-queues.
Related Issues (18)
- HTTP Task remains in scheduled state
- Misleading exception thrown when dynamo can't connect to any node in cluster HOT 2
- Message priority is ignored HOT 1
- “Multiple consumers” is implemented or not ? HOT 2
- schedulerForPrefetchProcessing thread in RedisQueue class is not used anywhere HOT 2
- Any plan to add multiple consumer functionality on queues ? HOT 1
- RedisDynoQueue.pop() not returning HTTP tasks that are in the queue HOT 6
- Testing workflow/task with priority HOT 2
- Dyno Queue pop doesnt work as expected
- how to use RedisPipelineQueue
- How does dyno-queues make sure that message would be consumed only once. HOT 2
- Prefetching ids multiple times leading to duplication. HOT 1
- processUnack not called after re-establishing connection to Redis DynoQueue HOT 1
- Cannot remove taskId from the queue shard conductor.test.QUEUE.HTTP.a HOT 4
- Performance Issues while using dyno-queues over dynomite
- Operation: ( remove ) failed on key: [(a shard in) DECISION ] HOT 6
- Unit test issue: com.netflix.dyno.queues.redis.v2.RedisDynoQueueTest > testTimeoutUpdate FAILED HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dyno-queues.