Comments (16)
Nice. I didn't know that there is function which is able to make "holes" in file. It can be easy added to chunk testing thread. File is read anyway during the testing, so why not to "punch" some holes inside on the way :)
from moosefs.
Yes, implementation is easy and relatively straightforward. Once written SPARSE files are completely transparent for I/O and need no code changes whatsoever. Win is huge both performance wise and in storage savings. I've managed to free more than 100 GiB on 1.8 TB HDD by manually sparsifying ~1_000_000 chunk files.
from moosefs.
It can be easy added to chunk testing thread.
Please do not do that. Chunk testing thread should not rewrite chunks or it may amplify damage in case of problems. Just modify function that write (save) chunks to HDD.
Janitorial sparsification of existing files is beyond the scope of this suggestion and can be done manually with fallocate -v --dig-holes
.
from moosefs.
I was thinking about that. I don't see any problems doing that during 'testing' procedure, but I can make it optional - maybe even switched off by default.
In case of modifying 'write' function I have doubts. Checking for zeros inside every block may slowdown write and usually people do not write blocks of zeros. The only problem we have is small files. As you probably know when you write only few data to chunk (like 4kB) then chunkserver will create sparse file (only header (5kB) and this 4kB will be written, and then file size will be set to 69kB, so the file will occupy only about 9kB). The problem is data replication. During replication whole 69kB will be sent and written, so after replication we can loose file 'sparsity' and this might be worth fixing.
My idea is to do that during testing (optionally) and during replication (always), but leave normal writes untouched.
from moosefs.
I don't see any problems doing that during 'testing' procedure, but I can make it optional - maybe even switched off by default.
Testing should do only that which is test. Please, pretty please do not compromise its integrity by turning it into maintenance rewriting data. If testing thread start rewriting data then it may be corrupting it under some circumstances. If you want sparsification to be background then you have to make another background process to do just that but it would be a bad design choice.
Checking for zeros inside every block may slowdown write and usually people do not write blocks of zeros.
That's why you might want to make it configurable. Sparsification slowdown is negligible and should be well compensated every time it saves space.
The only problem we have is small files.
No, that is the problem for last chunk of nearly all files (except few ones without holes that are perfectly aligned to chunk size).
During replication whole 69kB will be sent and written, so after replication we can loose file 'sparsity' and this might be worth fixing.
Well, it would be great to use LZ4 compression for data transfer to both clients and chunkservers as well as to implement native compression of chunk files (see #7).
Sending pre-compressed chunks is even better. :)
Chunkservers should only compress and/or sparsify chunks only on write.
This way sparsification will be preserved.
Having it optional may be useful depending on file system.
from moosefs.
Awesome! Thanks! :)
from moosefs.
Am I correct in understanding that if I run:
fallocate -v --dig-holes
on my chunkservers (that have files that were mostly written pre-3.0.75) then I will recieve some disk space back? Assuming I have many little files that are smaller than 64k in size?
Also do I need to run this command on all the chunkserver directories (the ones with 00 01 02 ... FF subdirectories) or on the mfsmount'ed drive instead?
.
Thanks for the clarification, I'm not a storage/SAN guy by trade.
from moosefs.
well I did run this on one of my drives:
find /mnt/satadrive1/mfschunks -execdir fallocate -v -d {} ;
Lets see how it goes. So far the HDD size is reducing quite noticeably (my repo is mainly lots of small files not large things like vmdk, vhd or iso files). Hopefully there will be no major errors or file corruption in the coming days.
Thanks for the tip.
from moosefs.
@nrm21 hey nathan, how did the fallocate go in the end? did you get much space back? notice any strangeness with files?
from moosefs.
I got about 16GB back once I did that across all my drives and so far I have not had issues. But I also probably have a small repo (total about 1.2TB of data) compared to most.
I would say do it slowly on one drive at a time (on the chunk files themselves not the mfs mounted files)... wait a day and do the next. If there is an issue I imagine the scrubbing and re-balance features would take care of it as long as you do slowly enough.
from moosefs.
@nrm21 cheers for the update - had a go earlier in fact, impressive! was already implementing what you'd mentioned - run it on one chunkserver at first, see how it goes - i was just interested in how other people were faring after the operation :)
from moosefs.
Hi people!
Can I ask which OS/version are you running?
My chunkservers are mostly CentOS 6/7 and I don't seem to have the right fallocate command (no -v, no -d):
[root@centos7 ~] # fallocate -h
Usage:
fallocate [options] <filename>
Options:
-n, --keep-size don't modify the length of the file
-p, --punch-hole punch holes in the file
-o, --offset <num> offset of the allocation, in bytes
-l, --length <num> length of the allocation, in bytes
-h, --help display this help and exit
-V, --version output version information and exit
For more details see fallocate(1).
I also checked on Ubuntu 12.04 and 14.04 just in case:
[root@ubuntu12 ~] # fallocate -h
Usage:
fallocate [options] <filename>
Options:
-h, --help this help
-n, --keep-size don't modify the length of the file
-o, --offset <num> offset of the allocation, in bytes
-l, --length <num> length of the allocation, in bytes
For more information see fallocate(1).
from moosefs.
All my boxes are Ubuntu 16.04 server (kernel 4.4.x), except one that's still running 14.04 server (3.19.x) and that one also did not have the right flag in it (as you discovered). And also I have a windows box running a chunkserver (cygwin has no fallocate that I know, and while NTFS supports file sparseness it's probably in a way that fallocate wouldn't work work anyway). So I skipped those boxes.
I dunno if newness of kernel has anything to do with, might be more of a repo version thing instead. But Ubuntu 16.04 def supports the flag. Might want to try a Centos, Fedora or RHEL that is post 3.x kernel.
.
If I understand correctly, this problem will eventually sort itself out because all NEW chunk writes are done sparsely as of 3.0.75. So if that's the case I would think you could get away with removing, then deleting a chunk server contents and re-adding it clean. Then on the re-balance all the new chunks added would be sparse yes? Kind of a large hammer for the problem but it should work. You'd obviously want to do this late at night or a time when network is in low usage (if you can even afford that downtime).
I think acid-maker would have to confirm if this logic is sound.
from moosefs.
Thank you for your response!
I'll probably test it from an Ubuntu (or similar) live CD, if it works I can take a chunkserver offline and do it that way. Otherwise I'll have to wait for when I can upgrade to 3.0.x.
from moosefs.
@richarson here, we're running mostly fedora/23 but there's also an immense freeNAS box running many drives in a raidZ / ZFS / lz4 compression - i'd offer to have a go at fallocate on that but in my understanding, ZFS/LZ4 should already be squishing everything down as far as it can go. the performance on that box is outstanding anyway 👯
from moosefs.
@tehbenneh Thanks!
I agree that ZFS+lz4 should already do the job so fallocate wouldn't make sense in that case.
from moosefs.
Related Issues (20)
- supports IPv6 HOT 4
- [BUG] The data displayed by mfs has garbled characters HOT 8
- mfsmaster -a restore hangs with 100% CPU usage HOT 5
- [Question] 2 copys of chunks on one chunkserver HOT 1
- [BUG] Performance impact and write amplification with CHANGELOG_SAVE_MODE = 2 HOT 9
- Do the Master and Chunk servers have to be the same architecture? HOT 3
- chunkserver: High speed rebalance blocks deletions? HOT 7
- [BUG] fuse: bad mount point `/matrix/synapse/storage/media-store/': Input/output error HOT 2
- [FEATURE] Official packages of MooseFS / MooseFS Pro for Debian 12 Bookworm HOT 2
- [BUG] mfsbdev and map + unmap + map on /dev/ndb0 = input/output error HOT 1
- [FEATURE] mfsclient mfstimeout default 0 HOT 1
- mfsmaster register error: No such file or directory HOT 3
- Can't mount MooseFS on Proxmox 8.1 properly. HOT 4
- MooseFS 3.x Erasure Code Support
- [BUG] mfsmaster hung and in unkillable D state HOT 3
- [BUG] DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13 HOT 2
- [FEATURE] mfsbdev as standard (TCP/Unix Socket) NBD server HOT 1
- [BUG] Empty chunks and copies with different checksums HOT 8
- Recovery data from chunks without metadata :) HOT 4
- [BUG] FUSE mount forces DIRECT I/O mode with Samba
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moosefs.