Friday, June 25, 2010

Encrypted Incremental Backups to S3

I spent some time this week trying to get secure online backups working for all my machines.

So far, I've been managing most of my data and workspaces with replicated Git repositories. I have scripts that allow me to maintain roaming profiles across my machines (almost) seamlessly, and these scripts try to ensure that these profiles are consistently replicated. My profiles include things like dot files (.vimrc, .screenrc, etc.), startup scripts, tools, workspaces, repositories, and other odds and ends.

Because I tend to be ultra-paranoid about security and reliability, the replicas are encrypted and distributed across different machines in different locations. For encryption, I use ecryptfs on Linux machines, and FileVault on the Mac.

Anyhow, this week I lost my Mac to a hardware failure, and my co-located Linux machine to a service-provider dismantling. That left me with one replica... just waiting to fail.

I decided that I needed another replica, but didn't want to pay for, or have to setup another co-located server. After spending some time researching various online-backup providers, I decided to go with Amazon's S3 service.

I chose S3 because - it's cheap, it's tried and tested, it's built on an internal distributed and replicated database, and there are some great tools that work with it.

Brackup


Brackup, by Brad Fitzpatrick, is one of those tools. It allows you to make encrypted incremental backups to S3, without a lot of hair-pulling or teeth-gnashing.

To get Brackup running on your machine, you need to have GPG, Perl 5, and the Net::Amazon::S3 Perl module installed. On the Mac, you also need to get MacPorts.

Installation


Most modern distributions come with Perl 5 pre-installed, but not with GPG. The package name you want on both MacPorts and Ubuntu, is gnupg.

The first thing you need to do, if you don't already have a GPG key, is to generate one.

$ gpg --gen-keys

If you need to backup multiple machines, export your public key to a text file, and import it on the other machines.

hostA$ gpg --export -a "User Name" > public.key
hostA$ scp public.key hostB:/tmp
hostB$ gpg --import /tmp/public.key

Remember that all your backups will be encrypted with your public key, so if you lose your private key, the only thing you can do with your backups is generate white noise. Export your private key and save it in a safe place. (I suggest VexCrypto.)

$ gpg --export-secret-key -a "User Name" > private.key

Now that you have your keys setup, download and install Brackup. The easiest way to do this is by using the cpan tool.

$ sudo cpan Net::Amazon::S3
$ sudo cpan Brackup

Note that it's better (and way faster) to use your distribution's package for Net::Amazon::S3. On Ubuntu the package is libnet-amazon-s3-perl, and on MacPorts, the package is p5-amazon-s3.

Configuration


Once this is done, you can generate a template configuration file by typing in brackup on the command line. This file is stored in $HOME/.brackup.

$ brackup
Error: Your config file needs tweaking.  I put a commented-out template at: /home/muthanna/.brackup.conf

brackup --from=[source_name] --to=[target_name] [--output=]
brackup --help

Edit the configuration file and create your sources and targets. You will likely have multiple sources, and one target. Here's a snip of my configuration:

[TARGET:amazon]
type = Amazon
aws_access_key_id  = XXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXX
keep_backups = 10

[SOURCE:mac_repos]
path = /Users/0xfe/Local
chunk_size = 5m
gpg_recipient = 79E44165
ignore = ^.*\.(swp|swo|hi|o|a|pyc|svn|class|DS_Store|Trash|Trashes)$/

[SOURCE:mac_desktop_books]
path = /Users/0xfe/Desktop/Books
gpg_recipient = 79E44165
ignore = ^.*\.(swp|swo|hi|o|a|pyc|svn|class|DS_Store|Trash|Trashes)$/

[SOURCE:mac_desktop_workspace]
path = /Users/0xfe/Desktop/Workspace
gpg_recipient = 79E44165
ignore = ^.*\.(swp|swo|hi|o|a|pyc|svn|class|DS_Store|Trash|Trashes)$/

The configuration keys are pretty self explanatory. I should point out that gpg_recipient is your public key ID, as shown by gpg --list-keys.

$ gpg --list-keys
/Users/0xfe/.gnupg/pubring.gpg
-----------------------------------
pub   2048R/79E44165 2010-06-24
uid                  My Username <snip@snip.com>
sub   2048R/43AD4B72 2010-06-24

For more details on the various parameters, see The Brackup Manual.

Start a Backup


To backup one of your sources, use the brackup command, as so:

$ brackup -v --from=mac_repos --to=amazon

If you now take a look at your AWS Dashboard, you should see the buckets and chunks created for your backup data.

Notice that brackup creates an output file (with the extension .brackup) in the current directory. This file serves as an index, and maintains pointers to the S3 chunks for each backed-up file. You will need this file to locate and restore your data, and a copy of it is maintained on S3.

Restoring


"Test restores regularly." -- a wise man.

To restore your Brackup backups, you will need to have your private key handy on the machine that you're restoring to. Brackup accesses your private key via gpg-agent.

$ sudo port install gpg-agent
$ eval $(gpg-agent --daemon)

The brackup-restore command restores a source tree to a path specified on the command line. It makes use of the output file that brackup generated during the initial backup to locate and restore your data. If you don't have a local copy of the output file, you can use brackup-target to retrieve a copy from S3.

$ brackup-restore -v --from=mac_repos-20100624.brackup \
  --to=/Users/0xfe/temp/mac_repos --all

You will be prompted for you AWS key, your AWS secret key, and your GPG private key passphrase. Make sure that that the restore completed successfully and correctly. Comparing the SHA1 hashes of the restored data with those of the original data is a good way to validate correctness.

Garbage Collection


You will need to prune and garbage collect your data regularly to keep backups from piling up and using up space in S3. The following commands delete old backed up chunks based on the keep_files configuration value of the target.

$ brackup-target amazon prune
$ brackup-target amazon gc

That's all folks! Secure, on-line, off-site, incremental, buzz-word-ridden backups. Code safely!

4 comments:

  1. Good information. Would love to read more about your git home directory setup.

    ReplyDelete
  2. I have some of the scripts, .rc files, etc. on github:

    http://github.com/0xfe/evil

    ReplyDelete
  3. Have you had a look at tarsnap (admittedly more expensive per gig) ?

    http://www.tarsnap.com/

    ReplyDelete
  4. Fantastic post! Thank you for the time you put in to share this.

    ReplyDelete