GithubHelp home page GithubHelp logo

Comments (17)

mronstro avatar mronstro commented on May 28, 2024

Thx for the detailed report, we will look into the issue, we are about to release 22.10.1, so will ensure that we fix the issues related to this. It is correct that one needs at least 21.04.9 to upgrade to 22.10.0.

from rondb.

m-g-r avatar m-g-r commented on May 28, 2024

Thanks Mikael!

Btw. this is running the new RonDB backend for Dydra, which now passes our complete test suite at https://github.com/dydra/http-api-tests/. The backend is based on our Common Lisp bindings for the NDB API, which are by now complete enough to support that backend.

From Berlin, Max

from rondb.

mronstro avatar mronstro commented on May 28, 2024

Thx Max,
This is very interesting information. It's very interesting to hear about new use cases for RonDB.
Let us know also if there are features that could be of use for you in RonDB future development.

from rondb.

m-g-r avatar m-g-r commented on May 28, 2024

Hi Mikael, any news here? Was any work done in that direction for 22.10.1?

I noticed that 22.10.1 is not released yet but surely in an almost stable state, right? Should we try to upgrade directly to 22.10.1? It is still an experimental environment for us and not production-level, so we could live even with data loss and reload our data in that case. (Even more so as the import with the rondb backend is now much faster than with our existing lmdb based backends.)

Max

from rondb.

mronstro avatar mronstro commented on May 28, 2024

from rondb.

m-g-r avatar m-g-r commented on May 28, 2024

Hi again, that worked much better, on the second attempt.

In the first attempt it ran out of memory:

ndb_2_out.log ended with:

2024-01-09 19:36:04 [ndbd] ERROR    -- Global memory manager is out of memory completely, no memory in shared global memory left and no memory in reserved memory that we can steal either.
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0xa8f2a1]
ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x37) [0x955077]
ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0x106) [0xa505a6]
ndbmtd(Ndbcntr::execSYSTEM_ERROR(Signal*)+0xac) [0x8169fc]
ndbmtd() [0xa7754c]
ndbmtd(mt_job_thread_main+0x37b) [0xa7994b]
ndbmtd() [0xa10842]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f14a76ec609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f14a6d74133]
2024-01-09 19:36:05 [ndbd] ERROR    -- Global memory manager is out of memory completely, no memory in shared global memory left and no memory in reserved memory that we can steal either. - Repeated 45 times
2024-01-09 19:36:05 [ndbd] INFO     -- Killed by node 2 as copyfrag failed, error: 827
2024-01-09 19:36:05 [ndbd] INFO     -- NDBCNTR (Line: 380) 0x00000006
2024-01-09 19:36:05 [ndbd] INFO     -- Error handler shutting down system
2024-01-09 19:36:08 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Occurred during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

and in the cluster management console it said:

ndb_mgm> Node 2: Data usage increased to 80%(140284 32K pages of total 175214)
Node 2: Data usage increased to 90%(156283 32K pages of total 173448)
Node 2: Index usage increased to 80%(16761 32K pages of total 20764)
Node 2: Index usage increased to 90%(16921 32K pages of total 18673)
Node 2: Data usage increased to 99%(170917 32K pages of total 172292)
Node 2: Forced node shutdown completed. Occurred during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

I deleted the biggest table, got the messsage:

ndb_mgm> Node 1: Data usage decreased to 65%(433721 32K pages of total 660940)

and tried again.

Now it came up and I could update the whole cluster.

I still notice that I cannot load as much data as before. I tried to load a bigger file (but still smaller than the data that was loaded before) and get:

Error with code 827: Out of memory in Ndb Kernel, table data (increase DataMemory)

while the management console told me:

ndb_mgm> Node 2: Data usage increased to 80%(139381 32K pages of total 173993)

So, I guess the automatic memory configuration changed and that causes the problems...

Yes, that seems to be the problem: My config just contains:

TotalMemoryConfig=48G
SharedGlobalMemory=16G

RonDB-21.04.9 turned this into:

TransactionMemory is 1965 MBytes
SharedGlobalMemory is 16384 MBytes
Total memory is 49152 MBytes
Used memory is 25714 MBytes
Remaining memory is 23437 MBytes
Setting DataMemory to 21093 MBytes

while RonDB-22.10.1 makes:

TransactionMemory is 15439 MBytes
SharedGlobalMemory is 16384 MBytes
Total memory is 49152 MBytes
Used memory is 42581 MBytes
Remaining memory is 6570 MBytes
Setting DataMemory to 5913 MBytes

I'll reconfigure and then it should work. So the change automatic memory configuration seems to be then only caveat so far. nice!

from rondb.

m-g-r avatar m-g-r commented on May 28, 2024

Update: configuring TransactionMemory explicitly did not help, as the new rondb increases it explicitly to a high value to a good 12 GB.
But I figured that I do not need to configure SharedGlobalMemory in that case and that turned out to be working well.

It is only a bit surprising that the two data nodes come up with different results, after I restarted both.

Node 1:

Total memory is 49152 MBytes
Used memory is 23807 MBytes
Remaining memory is 25344 MBytes
Setting DataMemory to 22810 MBytes
Setting DiskPageBufferMemory to 2450 MBytes

Node 2:

Total memory is 49152 MBytes
Used memory is 28747 MBytes
Remaining memory is 20404 MBytes
Setting DataMemory to 18364 MBytes
Setting DiskPageBufferMemory to 1973 MBytes

from rondb.

mronstro avatar mronstro commented on May 28, 2024

Looks like something I need to look into right away, could you provide the config.ini you used?

from rondb.

mronstro avatar mronstro commented on May 28, 2024

Also would be great to see the full node log, at least the part about the memory allocation sizes.

from rondb.

mronstro avatar mronstro commented on May 28, 2024

For both 21.04 and 22.10.1

from rondb.

mronstro avatar mronstro commented on May 28, 2024

Also presume you use AutomaticThreadConfig=1, so then also interesting to know how many CPUs the machine has.

from rondb.

mronstro avatar mronstro commented on May 28, 2024

The difference on the 2 nodes could be happening if they have different number of CPUs

from rondb.

mronstro avatar mronstro commented on May 28, 2024

Could reproduce with a very simple test. It seems that we changed memory configuration in a number of places to safeguard against running out of memory. But obviously we've been overcautious. However the difference of node 1 and node 2 is harder to understand unless they have a different set of CPUs. Will look into details on each of the differences and see what is the best strategy. The transaction memory is likely due to some configuration setting that you have used that affects TM calculations, so need to see the config.ini to understand that part. In my test TM was equal in 21.04.16 and 22.10.1

from rondb.

mronstro avatar mronstro commented on May 28, 2024

As part of this fix I will also ensure that MaxNoOfConcurrentOperations also ensures that we can handle transactions of this size. This means that each operation will consume about 1.5kB of TransactionMemory, so setting it to 2M for example will set a minimum of 3G TransactionMemory.

from rondb.

mronstro avatar mronstro commented on May 28, 2024

PR for RONDB-581 created for memory configuration issue

from rondb.

m-g-r avatar m-g-r commented on May 28, 2024

Thank you! Most of the stuff we discussed in e-mail already. While i could not transit from 21.04.9 to 21.10, the upgrade to 21.10.1 worked now successfully. And from your comments I conclude that we should be using 21.10.1 already.

As part of this fix I will also ensure that MaxNoOfConcurrentOperations also ensures that we can handle transactions of this size. This means that each operation will consume about 1.5kB of TransactionMemory, so setting it to 2M for example will set a minimum of 3G TransactionMemory.

Good!

Adapting cl-ndbapi to 22.10.1 seemed also minor. I needed to activate -std=c++17 for my C/C++ wrapper as the ndbapi now uses constructs of c++ 17. And there seem to be changes for NDB.set_eventbuf_max_alloc() and compare_ndbrecord both of which I do not use in the moment.

Is there some documentation about ndbapi changes between the two versions available?

from rondb.

m-g-r avatar m-g-r commented on May 28, 2024

Note: I've added NDB.set_eventbuf_max_alloc() again. Only compare_ndbrecord really changed in an incompatible fashion.

The signature changed from

      int compare_ndbrecord(const NdbReceiver *r1,
                            const NdbReceiver *r2,
                            const NdbRecord *key_record,
                            const NdbRecord *result_record,
                            bool descending,
                            bool read_range_no)

in rondb 21.04.9 to

      int compare_ndbrecord(const NdbReceiver *r1,
                            const NdbReceiver *r2,
                            const NdbRecord *key_record,
                            const NdbRecord *result_record,
                            const unsigned char *result_mask,
                            bool descending,
                            bool read_range_no)

in 22.10.1.

Well, probably meant to be internal. But it is in the interface and thus also included by swig.

from rondb.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.