GithubHelp home page GithubHelp logo

Comments (6)

bpo avatar bpo commented on June 11, 2024 1

I assume this is intermittent?

Yes, one crash after ~1 day of uptime.

Do you have any modules loaded?

No modules

Can you give me an idea of what other commands may be in flight around the same time?

Counts by commands last executed by client:

   2 blpop
 224 brpop
   8 evalsha
  38 exec
  18 get
   1 replconf
  16 scard
   1 script
   2 set
   1 slowlog
   1 strlen
   2 xadd
1689 xreadgroup
   1 zrangebyscore

from keydb.

JohnSully avatar JohnSully commented on June 11, 2024 1

This is fixed in change aae0fdc which is now in RELEASE_5

The root cause is that shared objects such as shared.rpop aren't marked as shared. So the refcount is being incremented and decremented. This isn't a problem with single threading however in a multithreaded environment client arguments must either be local to the current thread or shared (because the refcount is modified outside the lock).

Eventually shared.rpop's refcount hits zero due to the synchronization error and is freed. At some later point malloc recycles the memory and a new object is placed there. At that point the command lookup will fail and this assert will fire.

The fix is to correctly mark these and all other similar objects as shared.
I will continue to run my stress test for a few days to further validate the fix.

from keydb.

JohnSully avatar JohnSully commented on June 11, 2024

I assume this is intermittent?

I see blockingPopGenericCommand in the callstack. Can you give me an idea of what other commands may be in flight around the same time? I will setup a torture test to suss it out.

Also FYI RELEASE_5 isn't officially released yet so I've been updating it silently with bugfixes (though none that I think would resolve this). At this point I will only add bug fixes not new features so I do appreciate you testing it.

from keydb.

JohnSully avatar JohnSully commented on June 11, 2024

Do you have any modules loaded?

I think I may have found a code path that could cause this if modules are used.

from keydb.

bpo avatar bpo commented on June 11, 2024

Same machine was showing these periodically in the logs, for a handful of fd values, not clear to me if it's related.

epoll_ctl failed: No such file or directory
2881:M 14 Jul 2019 02:33:43.201 # Error registering fd event for the new client: No such file or directory (fd=290)

Reminiscent of redis/redis#2479, but Sentinel is uninvolved and we did not see any connectivity issues on the replica (prior to the master crash that is)

from keydb.

JohnSully avatar JohnSully commented on June 11, 2024

It's tricky because argv[0] should be the hardcoded value "rpop" but instead is ">". Reading the code this should be impossible.

It's set to that value not far away in blockingPopGenericCommand:

rewriteClientCommandVector(c,2,
                        (where == LIST_HEAD) ? shared.lpop : shared.rpop,
                        c->argv[j]);

I have a few boxes dedicated to fuzzing, I'll focus them on brpop.

from keydb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.