GithubHelp home page GithubHelp logo

candide-guevara / btrfs_to_glacier Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 1.0 1.47 MB

Makes periodic snapshots of my btrfs volumes and upload them to Glacier.

License: GNU General Public License v2.0

Makefile 0.56% C 16.00% Shell 1.44% Go 82.00%

btrfs_to_glacier's People

Contributors

candide-guevara avatar

Watchers

 avatar  avatar

btrfs_to_glacier's Issues

Replace custom XOR key keyring encryption with `gpg`

Problem

We use a custom scheme to decrypt/encrypt the backup keyring.
We could do it using gpg, that way the password would be cached and will not be stored in the application memory.

Solution

  • Replace custom code in encryption package with shell calls to gpg.
    • Could not find standard lib for gpg

How to get aws credentials?

Problem

  • No plain text AWS secret keys should be stored (ex: ~/.aws/credentials).
  • Always run using the least privileged IAM user, except for admin operations.
  • Should we rely on ~/.aws/config or the application config should be standalone?
  • Should we have different passwords for keyring and IAM users?

Solution

Add a canary to validate backup longevity

Problem

  • How can I be sure that newer versions of btrfs can understand previous dumps of btrfs-send ?
  • If compatibility is silently broken then the full chain of differential snapshots cannot be restored by a single btrfs version :/
    • You may be able to still recover the data but it will be tedious. This is why metadata contains the btrfs version for each snapshot.

Solution

Implement a "canary" backup/restore that runs before the real backup.

  • Use a dummy encryption key.
  • For each backup create a separate metadata and storage infrastructure to contain the canary.
    • For AWS S3 data should live in standard storage so that restore is fast.
  • Data is completely synthetic but it should contain enough info to self verify itself.
    • Each file name should be the hash of its contents.
    • Each new snapshot should create a new directory. Its name is the concatenation of the snapshot uuid with the hash of the hash of the file contents inside.
    • A file at the root of the snapshot should be the concatenation of the uuids of the previous snapshots.

If there is an error when restoring ALL snapshots from the canary, then there is a compatibility issue. Probably the best is to stop and start a new snapshot chain from scratch.

Do 2 invocations of btrfs-send return the exact same data ?

If I call twice btrfs-send to get differential data on a same snapshot, do I get the exact same bytes ?

This is important if I want to enable resumable uploads of snapshots.
Snapshot data is broken into chunks. If an upload fails but some chunks were uploaded successfully, can I just resume the upload of the missing chunks ? Will the result be the same byte per byte as a fully successful upload ?

A corrupted app configuration should not prevent restores

Problem

The keyring is stored encrypted in the configuration. If it gets corrupted or the config file is lost then all backups are unusable.

Solution

  • Is it enough to simply know the config file is versioned on github?
  • Should the encrypted keyring be stored in the metadata?
    • In such case the password for the keyring and the backup IAM user creds should be different. Otherwise breaking into the backup account will allow decryption of the backup data.

Add playbooks

Problem

I will forget how to do stuff after a while not working on this project.

Solution

Document the following procedures.

  • How to do an incremental backup
  • How to do an a new backup from scratch
  • How to clean old snapshots in source and backup
  • How to delete stuff in glacier
  • How to do a restore
  • How to revoke and create keys

BackupRestoreCanary must validate multi chunks snapshots

types.BackupRestoreCanary should provide an API for validating large multichunk snapshots.

The new API should be compatible with small snapshots for that the canary can create an arbitrary chain of large/small snapshots.

Add a large snapshot test in workflow/backup_restore_canary/canary_integration for in memory.

Add a static canary subvolume chunks in deep glacier

Problem

I want to have a "quick" way to check I can restore chunks from deep glacier into a btrfs subvolume.

Solution

Put a static subvolume (aka the same subvol for any backup) in a canary storage infrastructure.

  • Metadata is hardcoded so that metadata infrastructure is not needed to restore it.
  • Just add enough data for a single subvolume (chain restores will be tested using another mechanism).
  • len(chunks) > 1 but do not put a lot of data since this test costs money.
  • This static subvolume is created only once. It is written when you create a new backup chain.

A conscious user can run this at least once a year and check all is ok. The only issue is that it will cost a few cents and take a few hours.

Do I need to compress snapshot streams?

Problem

Does compressing stream (before encryption) bring any benefit?

Solution

  • Measure compression ratios on real snapshots.
  • If worth it then implement.
  • Otherwise document decision and rename types accordingly.

AWS fix vulnerability if attacker gets control of backup role account

Problem

  • Metadata and storage infrastructure can be created by backup role account.
  • Backup role account can delete/tamper with storage and metadata.

Goal

If an attacker gets control of backup role account, they should NOT be able to render the backup data unusable.

Solution

  • Assert that infrastructure is created only by the root user.
    • Backup role should only be able to write.
  • Use s3 object ownership, once data is written it cannot be modified by the backup role
  • At the end of each successful backup make a backup of dynamodb that cannot be tampered with by the backup role.

Use latest protocol for snapshot streams

Problem

btrfs-send can encode streams using different protocols. Which one should I choose for ensuring longevity?

Solution

  • Use --proto=0 to use the latest version (depends on running linux and btrfs-progs versions).
    • Overtime the protocol will change. btrfs-receive should understand old protocols but the best strategy is to create a new snapshot sequence every few years to remove dependence on old protocols?

Note: option --compressed-data is useless since it applies only if the btrfs filesystem uses transparent compression of files. It does not influence whether btrfs-send will compress the stream.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.