Comments (24)
elrond-node-0-dede76b030d3.log-20220705-1639.tar.gz
from mx-chain-go.
Does those nodes are used in resolving API requests? If yes, there are known issues that are fixed in the upcoming v1.3.33 binary release: https://github.com/ElrondNetwork/elrond-go/releases/tag/v1.3.33
ETA for the proper release: this week: 04-10.07.2022
from mx-chain-go.
These are validator nodes only. These are not full-copy nodes either.
These nodes are not used to resolve API requests.
I am running these nodes in support of the development community, I do not do any development myself using these nodes.
from mx-chain-go.
strange, the log does not contain any reference to a panic (usually we get this logged). The last line states that the heap in-use is ~740MB and the heap idle is ~4.73GB. Heap idle can be reclaimed by the OS at any time.
Jul 05 16:38:59 elrond-devnet-en6-hel1-1 node[40876]: DEBUG[2022-07-05 16:38:59.208] [common/statistics] [1/1177/1414589/(END_ROUND)] node statistics uptime = 2h22m3s timestamp = 1657039139 num go = 1195 alloc = 478.40 MB heap alloc = 478.40 MB heap idle = 4.73 GB heap inuse = 740.12 MB heap sys = 5.45 GB heap num objs = 1744127 sys mem = 5.78 GB num GC = 9817 FDs = 330 num opened files = 329 num conns = 88 accountsTrieDbMem = 0 B evictionDbMem = 0 B peerTrieDbMem = 0 B peerTrieEvictionDbMem = 0 B
from mx-chain-go.
my devnet machine that hosts 4 nodes takes around 7.9GB as claimed by the htop
utility program. The real usage is lower, most certainly. Since the OS does not perform any other tasks, and thus, does not need RAM for something else, it does not bother to reclaim the heap from the heap idle area.
from mx-chain-go.
I will change my systemd unit files to add --log-save to cause the nodes to log to disk. If this happens again we may find details in the disk logs that do not get recorded to the systemd journal.
I will update this issue if the problem occurs again.
from mx-chain-go.
Great! Can you also add the flag -profile-mode
?
That will automatically create a new directory called healthRecords
near the node's binary and in that directory will create profile files so we can check to trace some OOM root causes.
from mx-chain-go.
Done and nodes restarted.
The ExecStart directive in the systemd unit files are all set like this one now:
ExecStart=/home/elrond/elrond-nodes/node-0/node -log-save -profile-mode -log-logger-name -log-correlation -log-level *:DEBUG -rest-api-interface localhost:8080
from mx-chain-go.
Looks good 👍
Thanks.
from mx-chain-go.
I have another instance of a running node killed by the the kernel oom killer.
Messages recorded in /var/log/syslog:
Jul 7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767489] apps.plugin invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Jul 7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767515] oom_kill_process.cold+0xb/0x10
Jul 7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767669] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Jul 7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767743] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/elrond-node-4.service,task=node,pid=54480,uid=1001
Jul 7 21:52:20 elrond-devnet-en6-hel1-1 kernel: [454779.767888] Out of memory: Killed process 54480 (node) total-vm:26826588kB, anon-rss:23219180kB, file-rss:0kB, shmem-rss:4kB, UID:1001 pgtables:46008kB oom_score_adj:0
Jul 7 21:52:22 elrond-devnet-en6-hel1-1 kernel: [454781.563224] oom_reaper: reaped process 54480 (node), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB
I am attaching the log for node-4 for this occurrence.
elrond-go-2022-07-07-17-48-29.log.tar.gz
from mx-chain-go.
Thank you for the logs!
It seems like the validator went through a shuffle out from shard 2 to shard 1. During the syncing of the trie for the new shard, it failed with out of memory.
We will look into it and try to reproduce.
Also, I assume your machine has enough resources (RAM in particular), right? In order to eliminate this option
from mx-chain-go.
Hi @bogdan-rosianu - Should be plenty of resources - this machine has 32G Ram and 16 vCPUs, no swap configured:
$ free -m
total used free shared buff/cache available
Mem: 31354 13225 1873 1 16255 17665
Swap: 0 0 0
$ nproc
16
from mx-chain-go.
Hi @frankf1957 There is a tentative fix for out of memory during trie sync, planned for the July release, merged with the PR: #4248.
from mx-chain-go.
Hi @AdoAdoAdo I am still experiencing this issue with my DEVNET nodes.
When I reported this issue the node version was D1.3.31. It was upgraded to D1.3.36 as soon as the update was available.
I have collected validator log files, and also extracted the kernel oom killer info from syslog.
My executing details are:
Application version string: Elrond Node CLI App version D1.3.36.0-0-3ec53be47a/go1.17.6/linux-amd64/be36dd5835
Hosting VPS: Hetzner.com/cloud - CPX51 - 16 vCPU - 32G memory - 240 G disk
Kind regards.
from mx-chain-go.
from mx-chain-go.
I got a "File size too big" error trying to upload the node logs tar.gz file. I have unpacked them, and will upload individually.
from mx-chain-go.
elrond-node-0-dede76b030d3.log.tar.gz
from mx-chain-go.
from mx-chain-go.
elrond-node-2-dedeffa6ae5e.log.tar.gz
from mx-chain-go.
elrond-node-3-dedeffe364a6.log.tar.gz
from mx-chain-go.
elrond-node-4-dede323830ed.log.tar.gz
from mx-chain-go.
elrond-node-5-dede36762228.log.tar.gz
from mx-chain-go.
Hi @AdoAdoAdo I am still experiencing this issue with my DEVNET nodes.
When I reported this issue the node version was D1.3.31. It was upgraded to D1.3.36 as soon as the update was available.
I have collected validator log files, and also extracted the kernel oom killer info from syslog.
My executing details are: Application version string: Elrond Node CLI App version D1.3.36.0-0-3ec53be47a/go1.17.6/linux-amd64/be36dd5835 Hosting VPS: Hetzner.com/cloud - CPX51 - 16 vCPU - 32G memory - 240 G disk
Kind regards.
Hi @frankf1957 sorry I was not clear in my previous message, the fix is already part of #4245 (#4248 was merged into it) which is the next planned release, and going through last internal testing.
It was initially planned for July but there were some issues that took longer to fix, so the release got delayed.
from mx-chain-go.
Closing this issue as the fix will be included in the upcoming rc/2022-july release.
from mx-chain-go.
Related Issues (20)
- [BUG]Node can't pass block nonce 10270062 HOT 6
- [FEAT] Avoid override for external.toml during squad/nodes upgrade HOT 3
- Issues following "Setup a Local Testnet" documentation HOT 2
- [Bug]: DEVNET DSC was created from validator data - cannot unstake 1 node - public key mismatch HOT 9
- [Bug]: Node Storage pruning may not be working as expected (DEVNET,TESTNET,MAINNET) HOT 4
- [Bug]: LKMEX to XMEX conversion tracker incorrect calculation HOT 3
- [Bug]: Merge will take all positions even if only some of them were selected HOT 1
- [Bug]: Error get hypeblock 12482351+ HOT 15
- Upgrade dependency "github.com/gin-gonic/gin" HOT 3
- [Bug]: Unsure if security vulnerability report was delivered HOT 1
- [Question]: Elrond explorer fee calculation HOT 4
- ## [Codecov](https://app.codecov.io/gh/multiversx/mx-chain-go/pull/5280?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=multiversx) Report HOT 1
- [Question]: The principle of cross contract call in one shard HOT 1
- [Feature Request]: (gateway) Display inner transactions from relayed transactions in mempool when filtering by sender HOT 2
- [Question]: Exporting transactions from raw DB files HOT 3
- [Question]: How can I query the history getTotalCumulatedRewardsForUser HOT 10
- [Question]: How long can I see the legacy-delegation's rewards? HOT 1
- [Question]: How to modify networkComponentsFactory create failed: no transports defined? pls HOT 2
- [Bug]: Unable to locate this transaction hash HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mx-chain-go.