GithubHelp home page GithubHelp logo

Comments (11)

keithchew avatar keithchew commented on June 17, 2024 1

I did an upgrade from v4.0.8 to v4.2.1 on a Goerli node for Dencun and all went fine. So I decided to perform the same upgrade for a mainnet node (running Erigon). I can confirm that after the upgrade, the node cannot keep up to the head, and in prysm I get the same error as above:

time="2024-02-03 10:11:28" level=error msg="Could not process slots to get payload attribute" error="could not process slots: context deadline exceeded" prefix=blockchain

I then downgraded to v4.1.1 and the node synced up to head without any issues. I did notice when it was in v4.2.1, the CPU was constantly hitting 100%, but with v4.1.1 it was more under control (30-50%). I also noticed the active peers was around 70 vs 45, which I believe is from this PR:
#13278

but it probably has nothing to do with the issue.

from prysm.

keithchew avatar keithchew commented on June 17, 2024 1

Hi @nisdas

I am testing the RC, and it seems to have resolved the issue! It used to trickle each payload one at a time, but now it is pushing a whole bunch through for the node to catch up. CPU is also back to normal, great work!

I have also tested this on the Goerli node and all good there too. I did get this in the logs on startup, but everything seems operational after that...

time="2024-02-06 20:44:56" level=error msg="Error encountered while warming up blob pruner cache." error="pruning failed for 1 root directories: blobs could not be pruned for some roots"

from prysm.

prestonvanloon avatar prestonvanloon commented on June 17, 2024 1

Thanks @keithchew. The unable to prune directory issue is something we are debugging on your log is very helpful. It shouldn't be a problem at runtime and you could ignore it for now.

The workaround is to delete the directory $DATADIR/blobs/0x9265c01e6d2fdf61df34b0b025b61a19a1040a78ff13748f634610d4342ac82d

from prysm.

prestonvanloon avatar prestonvanloon commented on June 17, 2024

This seems like a geth issue. In your geth logs, it seems to take hundreds of milliseconds to process a single block, sometimes more than one second. It should be much faster than that. Typical processing is less than 40ms to 150ms.

When I've seen this in the past, I deleted the geth db and resynced geth. It's possible that there was an improper shutdown and state healing is ongoing in the background? See ethereum/go-ethereum#28855 (comment)

from prysm.

prestonvanloon avatar prestonvanloon commented on June 17, 2024

Potentially related

from prysm.

tunlong avatar tunlong commented on June 17, 2024

This seems like a geth issue. In your geth logs, it seems to take hundreds of milliseconds to process a single block, sometimes more than one second. It should be much faster than that. Typical processing is less than 40ms to 150ms.

When I've seen this in the past, I deleted the geth db and resynced geth. It's possible that there was an improper shutdown and state healing is ongoing in the background? See ethereum/go-ethereum#28855 (comment)

Thank you. But why did beacon 4.1.1 work fine. I just replaced beacon version to 4.1.1 and didn't do anything else. Before trying 4.1.1, restarting the machine or replacing RC version didn't help.

from prysm.

nisdas avatar nisdas commented on June 17, 2024

For those running into this issue, this PR should hopefully fix the issue. We will tag a rc soon and if all is well this will make it to our next release.

from prysm.

nisdas avatar nisdas commented on June 17, 2024

We have a rc here:
https://github.com/prysmaticlabs/prysm/releases/tag/v4.2.2-rc.0

If this goes well, it will be our next release. You can give it a try to see if it resolves your issue

from prysm.

prestonvanloon avatar prestonvanloon commented on June 17, 2024

@keithchew Do you have any error level logs prior to that one? It should have printed at least one log immediately before that to explain why it was unable to prune a directory.

from prysm.

keithchew avatar keithchew commented on June 17, 2024

@prestonvanloon you are right, sorry about that, here are the 2 errors above it:

time="2024-02-06 20:44:21" level=error msg="Unable to prune directory" directory=0x9265c01e6d2fdf61df34b0b025b61a19a1040a78ff13748f634610d4342ac82d error="slot could not be read from blob file 3.ssz: EOF"
time="2024-02-06 20:44:25" level=error msg="Could not clean up dirty states" error="OriginBlockRoot: not found in db" prefix=state-gen
time="2024-02-06 20:44:56" level=error msg="Error encountered while warming up blob pruner cache." error="pruning failed for 1 root directories: blobs could not be pruned for some roots"

from prysm.

prestonvanloon avatar prestonvanloon commented on June 17, 2024

@keithchew Following up on this. We did find another bug where blobs were not being saved properly. The issue you mentioned #13557 (comment) has been resolved in #13648

Edit: #13648 stops the issue from happening again, but does not clear bad blobs from disk. Delete your disk and resync or delete any zero byte ssz files from your blobs directory to stop the log messages.

from prysm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.