Have you read through available documentation, open Github issues and Github Q&A

[BUG] Performance impact and write amplification with CHANGELOG_SAVE_MODE = 2 about moosefs HOT 9 OPEN

Lathanderjk commented on June 15, 2024

[BUG] Performance impact and write amplification with CHANGELOG_SAVE_MODE = 2

from moosefs.

Comments (9)

chogata commented on June 15, 2024 1

Master doesn't know what the client process would consider a "single operation" for a file - it doesn't know that there is one "dd" command performed that requests 3 actions to be performed on one file. From the master's point of view those 3 operations might as well have been requested by 3 different processes using the same client. We would have to complicate the protocol and introduce some kind of markers from clients to the master to tell the master where the "fsync points" for metadata should be, but that would introduce a whole new level of "complicated" in master - client interactions, which would not be without a significant impact on performance.
May I ask why are you using save mode 2? As far as we know this is a very rarely used option.

from moosefs.

Lathanderjk commented on June 15, 2024 1

In my test setup i use DRBD to replicate medata (B protocol an RDMA transport) and manage moosefs master via pacemaker cluster manager to create HA setup.
CHANGELOG SAVE MODE=2 is necessary for clients and IO operation correctly and immediately recover after master failover. Without CHANGELOG_SAVE_MODE=2 you end up with stucked clients, holes in metadata reported by master or missing IO operations, not much reliable filesystem for production.
I very like performance, low resource requirements and simplicity of MooseFS... metadata operations(reads) compared to another distributed FS are blazing fast even with tens of milions files.

from moosefs.

borkd commented on June 15, 2024 1

Looks like you have answered your own question - it is a feature, not a bug. CHANGELOG_SAVE_MODE=2 is simply a higher priced insurance premium cluster admin decides to pay to lessen the risks and impact of inevitable unplanned outages. Considering you could operate mfsmaster on a barebone RPi any specialized/low latency storage devices or replication over RDMA easily falls under 'sophisticated hardware' umbrella

from moosefs.

xandrus commented on June 15, 2024

Hi,

this is what you can find in mfsmaster.cfg file description:

# Changelog save mode. There are three modes of writing changelogs:
# 0 - write in background by different process (less safe, but doesn't make master stop in case of heavy hdd load)
# 1 - write in foreground without syncing data (master waits for every changelog to be saved to hdd, but without syncing - a little more safe than the background option, but may cause master to stop and wait for flushing hdd buffers)
# 2 - write in foreground with fsync after each write (very safe, but may make your master very slow unless you have very sophisticated hardware)
# CHANGELOG_SAVE_MODE = 0

So mode 2 will always perform the fsync operation after every write, which is why you see such a performance drop.
And yes - this is not a bug.

from moosefs.

Lathanderjk commented on June 15, 2024

Of course i expected performance drop an I/O increase but not thousand times... or 3-4000x

from moosefs.

chogata commented on June 15, 2024

Changelogs are recorded at least every one second, even if the system is idle. If it's not idle, they are even more frequent. You are asking your kernel to fsync several write operations per second. This has a BIG impact on performance. This option is there only for cases that demand the highest level of security and has no application in most scenarios.

from moosefs.

Lathanderjk commented on June 15, 2024

I only asking if this is intended and there no room for some grouping and reducing fsync calls. One block write to file with dd is causing 6 writes by mfs master(some will be filesystem...)
I also try strace and dd count=10, 100 etc. and its doing at least 3x fsync per single written block.

from moosefs.

chogata commented on June 15, 2024

Grouping fsyncs is lessening security. This option really isn't meant for regular use and was added only after specific requests from our users. MooseFS used to have only option 0 (that is, writing changelogs in background, aka separate process), because we knew that any other option would severely impact performance. But for security reasons, when performance wasn't an issue, some users wanted to have the option. We added it, but we always said: use at your own risk. Kind of middle ground is option 1, that is writing changelogs in foreground, but without those fsyncs. No danger of the background process hanging (and nobody noticing, which was the main issue as I remember), but also not such a big impact on performance, at the cost of no fsyncs (so still a possiblity of loosing the tail of latest changelog file in case of hardware failure - but with grouped fsyncs that would also exists).

from moosefs.

Lathanderjk commented on June 15, 2024

Grouping fsync mean less security thats true but for single operation before sending OK to client? In this scenario losing one fsync would by just forcing client to retry operation?

Just metadata operation like touch or mkdir cousing only one fsync, but appending single line to file "date >> myfile" resulting everytime in 4 fsync by mfsmaster, dd is causing 3 fsync for every single block.

from moosefs.

[BUG] Performance impact and write amplification with CHANGELOG_SAVE_MODE = 2 about moosefs HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs