Project: Automated offsite backups for an NSLU2 -- part 6

Posted on 11 November 2006 in NSLU2 offsite backup project

Previously in this series: Part 1, Part 2, Part 3, Part 4, Part 5.

I now know that in order to get automated offsite backups from my NSLU2 to Amazon S3, I have to get it to run Ruby so that it can run s3sync. I want to back everything up first, which is going to take some time -- so while that's happening, I'll get both Ruby and s3sync installed on an Ubuntu Linux machine as a dry run.

Firstly, I need to download s3sync from its homepage, and unzip/untar it into a convenient directory.

The next step is to install Ruby. The machine I am using to test this is running Ubuntu 6.10 ("Edgy Eft"), and to install standard packages on it you use a tool called Synaptic Package Manager. This has somewhat conflicting information regarding Ruby. The basic package -- called, unsurprisingly, ruby -- claims to be version 1.8.2, and s3sync requires 1.8.4 or higher. There is also a ruby1.8 package, which appears to be version 1.8.4-5ubuntu1.1, so that sounds like it might be a good one to use. However, it states in its notes that "on Debian, Ruby 1.8 is provided as separate packages. You can get full Ruby 1.8 distribution by installing following packages. ruby1.8 ruby1.8-dev" etc. Most of the listed packages do not show up in the package manager. To make things even more confusing, the ruby package appears to depend on the ruby1.8 one, which implies that it actually is version 1.8.4... To keep things simple, I installed the ruby package, and then typed ruby ----version at the command line; it told me that it was version 1.8.4. OK.

The next step was to see if let's see if I could sync something with my S3 account. John Eberly has kindly blogged the details of how he set up s3sync on his system, and it's clear that setting up a bucket on S3 in advance using a separate application is a good idea; so, I downloaded jets3t Cockpit as he suggests and tried it out. (Note - you need Java 1.4 or higher installed to run it -- as an old Java hand using a newish machine, I was surprised to notice that I'd not yet installed it on my workstation. That's what switching to IronPython does to you.)

The jets3t Cockpit is a pretty simple app -- you can set up multiple logins in a "saved login" screen, or you can log in directly using the Access Key ID/Secret Access Key that Amazon provide. Once you're logged in, it is pretty trivial to create a new bucket - and, because all S3 buckets exist in the same namespace (even across users!), it sensibly suggests bucket names that start with your user ID. So, I created a bucket called my-access-key-id.Test.

Now, let's see if we can backup the s3sync install directory itself over to S3. Firstly, s3sync needs to know the access key ID and secret key:

# export AWS_ACCESS_KEY_ID=<my key ID>
# export AWS_SECRET_ACCESS_KEY=<my key>

And now, we can run the sync:

# ./s3sync.rb -r . <my key ID>.Test:aprefix

The first time I tried this, it failed - the Ruby files are not +x... Grumble. OK, trying again:

# chmod +x *.rb
# ./s3sync.rb -r . <my key ID>.Test:aprefix

Hmmm. I got an error message:

./S3.rb:25:in `require': no such file to load -- openssl (LoadError)

Interesting. Wild guesswork followed: I know that the default package list of Ubuntu does not contain the full list of available packages -- just those that are strictly Open Source (for some value of "Open Source"). I had a vague memory from somewhere, some time back, that OpenSSL has some odd licensing restrictions. Perhaps the Ruby SSL package is not part of the standard Ruby package because of this? That might also be the case with the other parts of the Ruby distribution mentioned in the note to the ruby package I mentioned above -- which would explain why they were missing from the package manager.

Working on this basis, I checked my Settings/Repositories dialog in the package manager, and noted that it was only looking at the "main" source of software -- "Canonical supported Open Source software". I set the checkboxes for the "universe" and "multiverse" sources as well -- for community-maintained packages and those with copyright restrictions -- and searched for OpenSSL packages. Bang - I got the libopenssl-ruby1.8 package, which had not previously been visible. I installed it and tried again...

...and the result was better, but still not great - this time I got a 403 "Access forbidden" error. My first guess was that I must have mistyped either the access key ID or the key itself.... but further inspection (and a copy/paste to make absolutely sure) showed that that was not the case. Another thought occured to me -- it might be a problem with simultaneous access from jets3t Cockpit and from s3sync, so I tried shutting the former down - but to no avail.

OK, time to investigate. A quick check on the thread where the script was announced showed that one other person was getting the same response (you can see his comment if you search for "403"), but it was because he'd not replaced the string "bucket" in the sample command line with his bucket name -- whoops. However, his second post, a little further down, gives an example of a call to a lower-level script that simply sends an S3 command and shows the result. I decided to try something like that:

# ./s3cmd.rb -dv list <my key ID>.Test 200

...and I got a message saying that there is too much of a clock skew between the "request time" and the current time. A quick check, and -- aha! -- my Ubuntu box's clock is out by half an hour O_o. Fixing that, and then re-running the list command made it work!

# ./s3cmd.rb -dv list <my key ID>.Test 200
list <my key ID>.Test: 200  {}
--------------------
200
Trying command list_bucket <my key ID>.Test max-keys 200
prefix   with 100 retries left
Response code: 200
#

So, I retried the s3sync, and it returned with no output -- a fine Unixy way of saying that it thinks it succeeded. I checked what I had stored in the bucket on on S3 using jets3t Cockpit.... and w00t! It was all there, with sensible keys.

Next, I waited for a few minutes, then ran the command again to see if it really was synchronising rather than simply copying. Of course, it was hard to be sure -- but the command returned much more quickly than last time, and the file timestamps on S3 didn't change their modified time -- which would seem to imply that it didn't try to re-write them.

As a final check, I touched one of the files, and synced again... but it was just as quick and there was no update to the modified time. Thinking that perhaps it was using hashes rather than the update time ¹, I tried changing the file using vi and syncing again... and this time the command took slightly longer to return, and the modified time updated.

At that point I decided I was confident that s3sync was working as it should -- and conveniently, the backup had completed. So it was time for the next step - installing the Unslung firmware on the slug.

Next: Installing Unslung.

After perusing the thread on the Amazon site more closely, I discovered that yes, it does use hashes -- it compares the MD5 of the file with a hash stored on S3 that happens to be the MD5. ↩