GithubHelp home page GithubHelp logo

Comments (24)

frankf1957 avatar frankf1957 commented on June 2, 2024

elrond-node-0-dede76b030d3.log-20220705-1639.tar.gz

from mx-chain-go.

iulianpascalau avatar iulianpascalau commented on June 2, 2024

Does those nodes are used in resolving API requests? If yes, there are known issues that are fixed in the upcoming v1.3.33 binary release: https://github.com/ElrondNetwork/elrond-go/releases/tag/v1.3.33
ETA for the proper release: this week: 04-10.07.2022

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

These are validator nodes only. These are not full-copy nodes either.
These nodes are not used to resolve API requests.
I am running these nodes in support of the development community, I do not do any development myself using these nodes.

from mx-chain-go.

iulianpascalau avatar iulianpascalau commented on June 2, 2024

strange, the log does not contain any reference to a panic (usually we get this logged). The last line states that the heap in-use is ~740MB and the heap idle is ~4.73GB. Heap idle can be reclaimed by the OS at any time.

Jul 05 16:38:59 elrond-devnet-en6-hel1-1 node[40876]: DEBUG[2022-07-05 16:38:59.208] [common/statistics]  [1/1177/1414589/(END_ROUND)] node statistics                          uptime = 2h22m3s timestamp = 1657039139 num go = 1195 alloc = 478.40 MB heap alloc = 478.40 MB heap idle = 4.73 GB heap inuse = 740.12 MB heap sys = 5.45 GB heap num objs = 1744127 sys mem = 5.78 GB num GC = 9817 FDs = 330 num opened files = 329 num conns = 88 accountsTrieDbMem = 0 B evictionDbMem = 0 B peerTrieDbMem = 0 B peerTrieEvictionDbMem = 0 B

from mx-chain-go.

iulianpascalau avatar iulianpascalau commented on June 2, 2024

my devnet machine that hosts 4 nodes takes around 7.9GB as claimed by the htop utility program. The real usage is lower, most certainly. Since the OS does not perform any other tasks, and thus, does not need RAM for something else, it does not bother to reclaim the heap from the heap idle area.

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

I will change my systemd unit files to add --log-save to cause the nodes to log to disk. If this happens again we may find details in the disk logs that do not get recorded to the systemd journal.

I will update this issue if the problem occurs again.

from mx-chain-go.

iulianpascalau avatar iulianpascalau commented on June 2, 2024

Great! Can you also add the flag -profile-mode ?
That will automatically create a new directory called healthRecords near the node's binary and in that directory will create profile files so we can check to trace some OOM root causes.

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

Done and nodes restarted.

The ExecStart directive in the systemd unit files are all set like this one now:

ExecStart=/home/elrond/elrond-nodes/node-0/node -log-save -profile-mode -log-logger-name -log-correlation -log-level *:DEBUG -rest-api-interface localhost:8080

from mx-chain-go.

iulianpascalau avatar iulianpascalau commented on June 2, 2024

Looks good 👍
Thanks.

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

I have another instance of a running node killed by the the kernel oom killer.

Messages recorded in /var/log/syslog:

Jul  7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767489] apps.plugin invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Jul  7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767515]  oom_kill_process.cold+0xb/0x10
Jul  7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767669] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jul  7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767743] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/elrond-node-4.service,task=node,pid=54480,uid=1001
Jul  7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767888] Out of memory: Killed process 54480 (node) total-vm:26826588kB, anon-rss:23219180kB, file-rss:0kB, shmem-rss:4kB, UID:1001 pgtables:46008kB oom_score_adj:0
Jul  7 21:52:22 elrond-devnet-en6-hel1-1 kernel: [454781.563224] oom_reaper: reaped process 54480 (node), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB

I am attaching the log for node-4 for this occurrence.
elrond-go-2022-07-07-17-48-29.log.tar.gz

from mx-chain-go.

bogdan-rosianu avatar bogdan-rosianu commented on June 2, 2024

Thank you for the logs!
It seems like the validator went through a shuffle out from shard 2 to shard 1. During the syncing of the trie for the new shard, it failed with out of memory.
We will look into it and try to reproduce.
Also, I assume your machine has enough resources (RAM in particular), right? In order to eliminate this option

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

Hi @bogdan-rosianu - Should be plenty of resources - this machine has 32G Ram and 16 vCPUs, no swap configured:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          31354       13225        1873           1       16255       17665
Swap:             0           0           0

$ nproc
16

from mx-chain-go.

AdoAdoAdo avatar AdoAdoAdo commented on June 2, 2024

Hi @frankf1957 There is a tentative fix for out of memory during trie sync, planned for the July release, merged with the PR: #4248.

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

Hi @AdoAdoAdo I am still experiencing this issue with my DEVNET nodes.

When I reported this issue the node version was D1.3.31. It was upgraded to D1.3.36 as soon as the update was available.

I have collected validator log files, and also extracted the kernel oom killer info from syslog.

My executing details are:
Application version string: Elrond Node CLI App version D1.3.36.0-0-3ec53be47a/go1.17.6/linux-amd64/be36dd5835
Hosting VPS: Hetzner.com/cloud - CPX51 - 16 vCPU - 32G memory - 240 G disk

Kind regards.

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

syslog.oom-kill.log

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

I got a "File size too big" error trying to upload the node logs tar.gz file. I have unpacked them, and will upload individually.

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

elrond-node-0-dede76b030d3.log.tar.gz

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

elrond-node-1-null.log.tar.gz

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

elrond-node-2-dedeffa6ae5e.log.tar.gz

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

elrond-node-3-dedeffe364a6.log.tar.gz

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

elrond-node-4-dede323830ed.log.tar.gz

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

elrond-node-5-dede36762228.log.tar.gz

from mx-chain-go.

AdoAdoAdo avatar AdoAdoAdo commented on June 2, 2024

Hi @AdoAdoAdo I am still experiencing this issue with my DEVNET nodes.

When I reported this issue the node version was D1.3.31. It was upgraded to D1.3.36 as soon as the update was available.

I have collected validator log files, and also extracted the kernel oom killer info from syslog.

My executing details are: Application version string: Elrond Node CLI App version D1.3.36.0-0-3ec53be47a/go1.17.6/linux-amd64/be36dd5835 Hosting VPS: Hetzner.com/cloud - CPX51 - 16 vCPU - 32G memory - 240 G disk

Kind regards.

Hi @frankf1957 sorry I was not clear in my previous message, the fix is already part of #4245 (#4248 was merged into it) which is the next planned release, and going through last internal testing.

It was initially planned for July but there were some issues that took longer to fix, so the release got delayed.

from mx-chain-go.

frankf1957 avatar frankf1957 commented on June 2, 2024

Closing this issue as the fix will be included in the upcoming rc/2022-july release.

from mx-chain-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.