I have noticed that rdiff-backup is extremely bloaty with one of my destination dirs.

Some destination dirs tend to bloat about rdiff-backup HOT 4 CLOSED

rdiff-backup commented on June 12, 2024

Some destination dirs tend to bloat

from rdiff-backup.

Comments (4)

marshallstokes commented on June 12, 2024

I would assume increment size is a function of the variety of data you are backing up. For instance if you host some websites and decide to keep frequent incremental backups of all website source code stored in /var/www your increment size is likely to remain lean and consistent in its growth. However, imagine your websites see a decent amount of traffic and you frequently run backups of /var/log/httpd - odds of developing a case of "bloaty destination directory" likely pretty high.

Consider the data you are backing up and how frequently it changes and turns over. Even a single log file or temporary folder in your set may find there is a high turnover folder in your backup source. If you want to exclude a folder containing unneeded files like boated logs, the --exclude feature and related CLI options are brilliant for this exact use case.

from rdiff-backup.

xHire commented on June 12, 2024

Shouldn’t this be accounted for in the increment size number? I wouldn’t mind if the increments were huge, because I understand that those are my backed up data. What I was wondering is, why the cumulative size of all increments is lower by an order of a magnitude than the total size of rdiff-backup-data/ directory where they are stored?

(Just by the way, the first example is a directory with my source codes, the second one of my e-mails stored in Maildir format (i.e., one file for each e-mail message – but even the old mailbox format was fine in terms of bloating (nonetheless, that was a separate rdiff-backup repository)). Both constitute almost a year of backups once per 30 minutes.)

from rdiff-backup.

marshallstokes commented on June 12, 2024

This is based on a mere high-level understanding of rdiff-backup from several years of use in a variety of systems, so it might be entirely incorrect.

I don't believe rdiff-backup stores any unnecessary data in its archives which would bloat things up beyond what we expect from the fundamental backup strategy the tool uses. And while I am not totally clear on what "Cumulative Size" is, I think it's the total size of your current increment plus the actual differential metadata describing changes since the previous increment.

So you have a current data set of 765MB and rdiff-backup needs 112MB for increment metadata/diffs. You have 2139 increments in the entire set, and while per-increment metadata varies between increments, with that many increments it is reasonable to wind up with an 11GB archive set.

I admire the ambitious 30-min backup schedule but due to its inability to selectively prune within the archive set (ditch the half-hourlies and maintain daily copies older than one month, for example) if disk space is an issue or you find that even trimming increments older than a certain date or restoring any increment is excessively time consuming and resource intensive, have a look at some of the other mature backup solutions out there. This is a fantastic tool but no solution is fit for every backup strategy.

I hope that is helpful :)

from rdiff-backup.

ericzolf commented on June 12, 2024

With the explanations from marshallstokes, I think we can close this issue. Feel free to re-open it if you think that it's still an issue.

from rdiff-backup.

Some destination dirs tend to bloat about rdiff-backup HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs