Comments (13)
Has there been any implementation research work done on this since 2/2019?
from moosefs.
Btrfs supports compression. But on large file systems its performance degrade significantly over time due to fragmentation. Also Btrfs requires periodic balancing. Unfortunately Btrfs is not the best fit for Chunkservers on rotational HDDs but I would consider using it on SSDs.
from moosefs.
If chunkserver would send compressed data directly from storage to client (and client would decompress it) we could get gain in network throughput. At cost of higher CPU usage on client side.
from moosefs.
LZ4 overhead is negligible and its speed is close to RAM-to-RAM copy:
LZ4 is a very fast lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU. It also features an extremely fast decoder, with speed in multiple GB/s per core, typically reaching RAM speed limits on multi-core systems.
Also LZ4 was implemented natively in the Linux kernel 3.11.
from moosefs.
There are two separate issues raised in this thread:
- compression of chunk data on chunkserver - this is on our roadmap, but it doesn't have a high priority, as we feel that you can use other tools (like local filesystem with compression) if this is something you really need and most MooseFS installations we know store data that is already compressed, meaning this feature would be useless in them anyway; there are other features we are currently implementing that we feel will be useful in wider range of installations
- compression of chunk data in chunkserver - client communication - well, the problem is, we see more and more MooseFS installations using 10Gb networks and no compression algorithms are fast enough for that...
from moosefs.
* compression of chunk data in chunkserver - client communication - well, the problem is, we see more and more MooseFS installations using 10Gb networks and no compression algorithms are fast enough for that...
That depends on type of data. Highly compressible data might benefit from compression and reduce traffic congestion.
Also 10Gb network may not be used exclusively so compression might still yield improvements under many circumstances.
Let's just make it configurable, perhaps by chunkserver config option so compression could be enabled where required (e.g. on 100MB links).
from moosefs.
Curious, when you say:
we feel that you can use other tools (like local filesystem with compression)
what tools or filesystems have been used? I only know of ZFS that supports compression. Are there others?
from moosefs.
Compression could leverage the very handy storage classes, such that compression happens lazily:
A user could request data to be cheaply compressed using e.g. lz4 when the chunks get into 'keep' mode, and using something gzip-like if/when files become old enough to reach archive. Even two archive levels could then be useful, to request even more expensive xz-like compression for files that are even older.
The compression also would not need to happen at the time of the storage class move. It would be enough to just consider chunks that are at least that old eligible for compression, whenever the chunkserver feels it has spare CPU cycles.
Also, all copies of a chunk need not have the same compression. Then reading could prefer the less compressed version, while having redundancy with higher compression.
At least for the last stage, compression could also be avoided when the previous stage did not manage to squeeze out e.g. even a percent, as such files likely already are compressed, or contain in-compressible data.
from moosefs.
Another problem. I have btrfs with zstd compression, and i have MooseFS chunks on btrfs.
But when i add new files, then MooseFS calculate internally uncompressed size of uploaded data and ignores df -h
real size with compressed data.
df -h shows correct information (zstd compressed data):
First server with two btrfs partitions:
/dev/nvme0n1p2 842G 2.7G 838G 1% /mnt/btrfs1
/dev/nvme1n1p2 842G 2.7G 838G 1% /mnt/btrfs2
Second server with two btrfs partitions:
/dev/nvme0n1p2 842G 2.7G 838G 1% /mnt/btrfs1
/dev/nvme1n1p2 842G 2.7G 838G 1% /mnt/btrfs2
Summary real compressed size is 2.7+2.7 = 5.4GB
MooseFS shows used space: 14GB from 1.6TB of two btrfs partitions from first server and same from second server.
I understand, moosefs don`t know about compression inside btrfs.
As I understand it, it turns out that compression using native fs like btrfs is simply useless, because when MooseFS considers that the space has run out, MooseFS simply will not allow you to write more data, although in reality there will be plenty of space in btrfs. Is there no way around this?
from moosefs.
This is strange, because MooseFS should be showing the same thing that df does... Chunk server doesn't calculate data size from chunks, it uses the statvfs
system function to check the total disk size and available size. And then calculates used from that.
Unless you use size limiting in hdd.cfg, then it will calculate the size based on chunks, which will not take compression into account. Can you post the content of your hdd.cfg from one of these servers?
from moosefs.
@chogata, Hi.
root@srv-1:~# cat /etc/mfs-chunk_btrfs/mfshdd.cfg
/mnt/btrfs1/moosefs-chunk
/mnt/btrfs2/moosefs-chunk
But I have this option:
root@srv-1:~# cat /etc/mfs-chunk_btrfs/mfschunkserver.cfg | grep HDD_LEAVE
HDD_LEAVE_SPACE_DEFAULT = 4GiB
from moosefs.
Ah, that explains it :) A chunk server has no other way to tell the master that it needs to reserve 4GiB, other than to say it's occupied. So, per server, each drive has 3GiB of data (rounded up), 4GiB of reserved space. 3+4+3+4 is 14 :)
HDD_LEAVE_SPACE_DEFAULT setting doesn't change the way the used space is calculated by a chunk server, it still uses statvfs
. Only the restriction on space per disk in hdd.cfg does.
The assumption is, HDD_LEAVE_SPACE_DEFAULT is a small amount of space left "just in case" (like most OSes will always reserve some space on / for root user only). But if one uses limiting in hdd.cfg, then probably the disks are shared and the chunk server is not the only process that uses the disks to store data. Then using statvfs
would count other data - not chunks - as chunks and that's why in this situation we use the "manual" calculation.
from moosefs.
@chogata, Thanks. Yeah, sorry, this is my mistake, option HDD_LEAVE_SPACE_DEFAULT = 4GiB reserves ~4GB per btrfs partition, i have 4 partitions on 2 chunkservers - i was simply misled by this.
Used space is correctly calculated by MooseFS with compression in BTRFS.
A short off-topic question: when will MooseFS / PRO 3.0.117 packages be released for Debian 12? :)
from moosefs.
Related Issues (20)
- supports IPv6 HOT 4
- [BUG] The data displayed by mfs has garbled characters HOT 8
- mfsmaster -a restore hangs with 100% CPU usage HOT 5
- [Question] 2 copys of chunks on one chunkserver HOT 1
- [BUG] Performance impact and write amplification with CHANGELOG_SAVE_MODE = 2 HOT 9
- Do the Master and Chunk servers have to be the same architecture? HOT 3
- chunkserver: High speed rebalance blocks deletions? HOT 7
- [BUG] fuse: bad mount point `/matrix/synapse/storage/media-store/': Input/output error HOT 2
- [FEATURE] Official packages of MooseFS / MooseFS Pro for Debian 12 Bookworm HOT 2
- [BUG] mfsbdev and map + unmap + map on /dev/ndb0 = input/output error HOT 1
- [FEATURE] mfsclient mfstimeout default 0 HOT 1
- mfsmaster register error: No such file or directory HOT 3
- Can't mount MooseFS on Proxmox 8.1 properly. HOT 4
- MooseFS 3.x Erasure Code Support
- [BUG] mfsmaster hung and in unkillable D state HOT 3
- [BUG] DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13 HOT 2
- [FEATURE] mfsbdev as standard (TCP/Unix Socket) NBD server HOT 1
- [BUG] Empty chunks and copies with different checksums HOT 8
- Recovery data from chunks without metadata :) HOT 4
- [BUG] FUSE mount forces DIRECT I/O mode with Samba
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moosefs.