Chunkservers can gain performance and use disk space more efficiently by writing SPARS

It can be easy added to chunk testing thread. <p dir="a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

chunkservers: SPARSE chunk files [feature] about moosefs HOT 16 CLOSED

moosefs commented on May 18, 2024

chunkservers: SPARSE chunk files [feature]

from moosefs.

Comments (16)

acid-maker commented on May 18, 2024

Nice. I didn't know that there is function which is able to make "holes" in file. It can be easy added to chunk testing thread. File is read anyway during the testing, so why not to "punch" some holes inside on the way :)

from moosefs.

onlyjob commented on May 18, 2024

Yes, implementation is easy and relatively straightforward. Once written SPARSE files are completely transparent for I/O and need no code changes whatsoever. Win is huge both performance wise and in storage savings. I've managed to free more than 100 GiB on 1.8 TB HDD by manually sparsifying ~1_000_000 chunk files.

from moosefs.

onlyjob commented on May 18, 2024

It can be easy added to chunk testing thread.

Please do not do that. Chunk testing thread should not rewrite chunks or it may amplify damage in case of problems. Just modify function that write (save) chunks to HDD.
Janitorial sparsification of existing files is beyond the scope of this suggestion and can be done manually with fallocate -v --dig-holes.

from moosefs.

acid-maker commented on May 18, 2024

I was thinking about that. I don't see any problems doing that during 'testing' procedure, but I can make it optional - maybe even switched off by default.

In case of modifying 'write' function I have doubts. Checking for zeros inside every block may slowdown write and usually people do not write blocks of zeros. The only problem we have is small files. As you probably know when you write only few data to chunk (like 4kB) then chunkserver will create sparse file (only header (5kB) and this 4kB will be written, and then file size will be set to 69kB, so the file will occupy only about 9kB). The problem is data replication. During replication whole 69kB will be sent and written, so after replication we can loose file 'sparsity' and this might be worth fixing.

My idea is to do that during testing (optionally) and during replication (always), but leave normal writes untouched.

from moosefs.

onlyjob commented on May 18, 2024

I don't see any problems doing that during 'testing' procedure, but I can make it optional - maybe even switched off by default.

Testing should do only that which is test. Please, pretty please do not compromise its integrity by turning it into maintenance rewriting data. If testing thread start rewriting data then it may be corrupting it under some circumstances. If you want sparsification to be background then you have to make another background process to do just that but it would be a bad design choice.

Checking for zeros inside every block may slowdown write and usually people do not write blocks of zeros.

That's why you might want to make it configurable. Sparsification slowdown is negligible and should be well compensated every time it saves space.

The only problem we have is small files.

No, that is the problem for last chunk of nearly all files (except few ones without holes that are perfectly aligned to chunk size).

During replication whole 69kB will be sent and written, so after replication we can loose file 'sparsity' and this might be worth fixing.

Well, it would be great to use LZ4 compression for data transfer to both clients and chunkservers as well as to implement native compression of chunk files (see #7).
Sending pre-compressed chunks is even better. :)

Chunkservers should only compress and/or sparsify chunks only on write.
This way sparsification will be preserved.
Having it optional may be useful depending on file system.

from moosefs.

onlyjob commented on May 18, 2024

Awesome! Thanks! :)

from moosefs.

nrm21 commented on May 18, 2024

Am I correct in understanding that if I run:

fallocate -v --dig-holes

on my chunkservers (that have files that were mostly written pre-3.0.75) then I will recieve some disk space back? Assuming I have many little files that are smaller than 64k in size?

Also do I need to run this command on all the chunkserver directories (the ones with 00 01 02 ... FF subdirectories) or on the mfsmount'ed drive instead?

Thanks for the clarification, I'm not a storage/SAN guy by trade.

from moosefs.

nrm21 commented on May 18, 2024

well I did run this on one of my drives:

find /mnt/satadrive1/mfschunks -execdir fallocate -v -d {} ;

Lets see how it goes. So far the HDD size is reducing quite noticeably (my repo is mainly lots of small files not large things like vmdk, vhd or iso files). Hopefully there will be no major errors or file corruption in the coming days.

Thanks for the tip.

from moosefs.

tehbenneh commented on May 18, 2024

@nrm21 hey nathan, how did the fallocate go in the end? did you get much space back? notice any strangeness with files?

from moosefs.

nrm21 commented on May 18, 2024

I got about 16GB back once I did that across all my drives and so far I have not had issues. But I also probably have a small repo (total about 1.2TB of data) compared to most.

I would say do it slowly on one drive at a time (on the chunk files themselves not the mfs mounted files)... wait a day and do the next. If there is an issue I imagine the scrubbing and re-balance features would take care of it as long as you do slowly enough.

from moosefs.

tehbenneh commented on May 18, 2024

@nrm21 cheers for the update - had a go earlier in fact, impressive! was already implementing what you'd mentioned - run it on one chunkserver at first, see how it goes - i was just interested in how other people were faring after the operation :)

from moosefs.

richarson commented on May 18, 2024

Hi people!

Can I ask which OS/version are you running?

My chunkservers are mostly CentOS 6/7 and I don't seem to have the right fallocate command (no -v, no -d):

[root@centos7 ~] # fallocate -h

Usage:
 fallocate [options] <filename>

Options:
 -n, --keep-size     don't modify the length of the file
 -p, --punch-hole    punch holes in the file
 -o, --offset <num>  offset of the allocation, in bytes
 -l, --length <num>  length of the allocation, in bytes

 -h, --help     display this help and exit
 -V, --version  output version information and exit

For more details see fallocate(1).

I also checked on Ubuntu 12.04 and 14.04 just in case:

[root@ubuntu12 ~] # fallocate -h

Usage:
 fallocate [options] <filename>

Options:
 -h, --help          this help
 -n, --keep-size     don't modify the length of the file
 -o, --offset <num>  offset of the allocation, in bytes
 -l, --length <num>  length of the allocation, in bytes

For more information see fallocate(1).

from moosefs.

nrm21 commented on May 18, 2024

All my boxes are Ubuntu 16.04 server (kernel 4.4.x), except one that's still running 14.04 server (3.19.x) and that one also did not have the right flag in it (as you discovered). And also I have a windows box running a chunkserver (cygwin has no fallocate that I know, and while NTFS supports file sparseness it's probably in a way that fallocate wouldn't work work anyway). So I skipped those boxes.

I dunno if newness of kernel has anything to do with, might be more of a repo version thing instead. But Ubuntu 16.04 def supports the flag. Might want to try a Centos, Fedora or RHEL that is post 3.x kernel.

If I understand correctly, this problem will eventually sort itself out because all NEW chunk writes are done sparsely as of 3.0.75. So if that's the case I would think you could get away with removing, then deleting a chunk server contents and re-adding it clean. Then on the re-balance all the new chunks added would be sparse yes? Kind of a large hammer for the problem but it should work. You'd obviously want to do this late at night or a time when network is in low usage (if you can even afford that downtime).

I think acid-maker would have to confirm if this logic is sound.

from moosefs.

richarson commented on May 18, 2024

Thank you for your response!
I'll probably test it from an Ubuntu (or similar) live CD, if it works I can take a chunkserver offline and do it that way. Otherwise I'll have to wait for when I can upgrade to 3.0.x.

from moosefs.

tehbenneh commented on May 18, 2024

@richarson here, we're running mostly fedora/23 but there's also an immense freeNAS box running many drives in a raidZ / ZFS / lz4 compression - i'd offer to have a go at fallocate on that but in my understanding, ZFS/LZ4 should already be squishing everything down as far as it can go. the performance on that box is outstanding anyway 👯

from moosefs.

richarson commented on May 18, 2024

@tehbenneh Thanks!
I agree that ZFS+lz4 should already do the job so fallocate wouldn't make sense in that case.

from moosefs.

chunkservers: SPARSE chunk files [feature] about moosefs HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs