GithubHelp home page GithubHelp logo

Comments (4)

marshallstokes avatar marshallstokes commented on June 12, 2024

I would assume increment size is a function of the variety of data you are backing up. For instance if you host some websites and decide to keep frequent incremental backups of all website source code stored in /var/www your increment size is likely to remain lean and consistent in its growth. However, imagine your websites see a decent amount of traffic and you frequently run backups of /var/log/httpd - odds of developing a case of "bloaty destination directory" likely pretty high.

Consider the data you are backing up and how frequently it changes and turns over. Even a single log file or temporary folder in your set may find there is a high turnover folder in your backup source. If you want to exclude a folder containing unneeded files like boated logs, the --exclude feature and related CLI options are brilliant for this exact use case.

from rdiff-backup.

xHire avatar xHire commented on June 12, 2024

Shouldn’t this be accounted for in the increment size number? I wouldn’t mind if the increments were huge, because I understand that those are my backed up data. What I was wondering is, why the cumulative size of all increments is lower by an order of a magnitude than the total size of rdiff-backup-data/ directory where they are stored?

(Just by the way, the first example is a directory with my source codes, the second one of my e-mails stored in Maildir format (i.e., one file for each e-mail message – but even the old mailbox format was fine in terms of bloating (nonetheless, that was a separate rdiff-backup repository)). Both constitute almost a year of backups once per 30 minutes.)

from rdiff-backup.

marshallstokes avatar marshallstokes commented on June 12, 2024

This is based on a mere high-level understanding of rdiff-backup from several years of use in a variety of systems, so it might be entirely incorrect.

I don't believe rdiff-backup stores any unnecessary data in its archives which would bloat things up beyond what we expect from the fundamental backup strategy the tool uses. And while I am not totally clear on what "Cumulative Size" is, I think it's the total size of your current increment plus the actual differential metadata describing changes since the previous increment.

So you have a current data set of 765MB and rdiff-backup needs 112MB for increment metadata/diffs. You have 2139 increments in the entire set, and while per-increment metadata varies between increments, with that many increments it is reasonable to wind up with an 11GB archive set.

I admire the ambitious 30-min backup schedule but due to its inability to selectively prune within the archive set (ditch the half-hourlies and maintain daily copies older than one month, for example) if disk space is an issue or you find that even trimming increments older than a certain date or restoring any increment is excessively time consuming and resource intensive, have a look at some of the other mature backup solutions out there. This is a fantastic tool but no solution is fit for every backup strategy.

I hope that is helpful :)

from rdiff-backup.

ericzolf avatar ericzolf commented on June 12, 2024

With the explanations from marshallstokes, I think we can close this issue. Feel free to re-open it if you think that it's still an issue.

from rdiff-backup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.