Project: Automated offsite backups for an NSLU2 -- part 9

Posted on 14 November 2006 in NSLU2 offsite backup project

Previously in this series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8.

I'm setting up automated offsite backups from my NSLU2 to Amazon S3. The tool I need to use to make this happen is called s3sync; it's a Ruby script, so in order to run it, I had to work out some way of installing Ruby. In order to do that, I had to replace the slug's firmware with a different version of Linux, called Unslung; once that was done, getting Ruby up and running wasn't too tricky. The next step was to get s3sync itself to work.

I started by getting the s3sync script itself copied over to the slug. It's a gzipped tar archive, and is available from a web server -- so the obvious tools to use were gunzip, tar, and wget. Conveniently, these were already installed as part of Unslung (or perhaps the original firmware):

# gunzip
gunzip: compressed data not read from terminal.  Use -f to force it.
# wget
BusyBox v1.00 (2006.04.11-01:22+0000) multi-call binary

Usage: wget [-c|--continue] [-q|--quiet] [-O|--output-document file]
                [--header 'header: value'] [-Y|--proxy on/off] [-P DIR] url

wget retrieves files via HTTP or FTP

Options:

...

# tar
BusyBox v1.00 (2006.04.11-01:22+0000) multi-call binary

Usage: tar -[czjZxtvO] [-f TARFILE] [-C DIR] [FILE(s)] ...

Create, extract, or list files from a tar file.

Options:

...

So, I created a temporary directory, downloaded, and unpacked the script:

# cd /tmp
# mkdir s3sync
# cd s3sync
# wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
Connecting to s3.amazonaws.com[72.21.206.42]:80
# gunzip s3sync.tar.gz
# tar xf s3sync.tar
# ls
HTTPStreaming.rb README.txt      README_s3cmd.txt S3.rb
S3_s3sync_mod.rb S3encoder.rb    s3cmd.rb         s3sync.rb
s3sync.tar       s3try.rb         thread_generator.rb
#

The Ruby files, I noted, were not marked as executable, so I fixed that. I also remembered from when I was installing Ruby on the slug that the location of the interpreter installed by the standard Unslung package was not on the path - it was /opt/bin/ruby, and so the "#!" lines at the start of the scripts would probably need to be changed to reflect that. I checked the start of the s3sync scripts, and noticed that all of the top-level ones used the following first line:

#!/usr/bin/env ruby

This looked a bit odd to me - I've used env to list the environment, but never as a launcher for an interpreter. However, a quick poke around made me comfortable that it was just a way of avoiding putting an explicit path to the interpreter into the script file. As /usr/bin/env did not exist on the slug yet -- though perhaps I could have installed it -- I decided to modify the scripts to refer to the location of the ruby command on the machine.

The next steps were to set up the access key ID and the secret key, just as before:

# export AWS_ACCESS_KEY_ID=<my key ID>
# export AWS_SECRET_ACCESS_KEY=<my key>

...and to try running the script as a test, once more to synchronising the script's own directory into the bucket I'd previously set up on S3 (with a different prefix for the keys to the one I used in my original test).

# ./s3sync.rb -r . <my key ID>.Test:adifferentprefix
./S3encoder.rb:42:in `iconv': invalid encoding ("UTF-8", "ISO-8859-1") (Iconv::InvalidEncoding)
        from ./S3encoder.rb:42:in `escape'
        from ./S3.rb:138:in `list_bucket'
        from ./s3sync.rb:21:in `map'
        from ./S3.rb:138:in `each'
        from ./S3.rb:138:in `map'
        from ./S3.rb:138:in `list_bucket'
        from ./s3try.rb:51:in `send'
        from ./s3try.rb:51:in `S3try'
        from ./s3sync.rb:244:in `s3TreeRecurse'
        from ./s3sync.rb:293:in `main'
        from ./thread_generator.rb:79:in `call'
        from ./thread_generator.rb:79:in `initialize'
        from ./thread_generator.rb:76:in `new'
        from ./thread_generator.rb:76:in `initialize'
        from ./s3sync.rb:226:in `new'
        from ./s3sync.rb:226:in `main'
        from ./s3sync.rb:631
#

Oh dear. Well, I'd wanted to learn Ruby for some time, so here was a great incentive. The line causing the error, line 42 in S3encoder.rb, read:

result = Iconv.iconv("UTF-8", @nativeCharacterEncoding, string).join if @useUTF8InEscape

A bit of Googling around led to a (the?) Ruby documentation site, where from the page describing the Iconv class, it was clear that this call was a way of asking the runtime environment to convert the string in the variable string from the UTF-8 character set to whatever charset was specified in the variable (or perhaps field?) nativeCharacterEncoding. A few lines higher up, nativeCharacterEncoding appeared to be being set to "ISO-8859-1", which made sense, especially given the error message.

However, this seemed strange -- after all, UTF-8 is pretty much the standard character set for most new applications and systems, and ISO-8859-1, aka Latin-1, is the charset that predated it (and is used for most HTML). Still, the slug is a small embedded system -- so perhaps, I thought, it might lack certain charsets? Might it be something dreadful like ASCII-only?

I decided to read through the list of available packages, to see if there was something obvious that needed to be installed - an "essential-charsets" package or something like that:

# ipkg list
abook - 0.5.5-1 - Abook is a text-based addressbook program designed to use with mutt mail client.
adduser - 1.1.3-6 - a multi-call binary for login and user account administration
adns - 1.3-2 - Asynchronous resolver library and DNS resolver utilities.

...

gambit-c - 4.0b20-1 - A portable implementation of Scheme.
gawk - 3.1.5-1 - Gnu AWK interpreter
gconv-modules - 2.2.5-5 - Provides gconv modules missing from the firmware.  These are used by glibc's iconv() implementation.

Now that last one looked promising - after all, as the Ruby documentation said:

Iconv is a wrapper class for the UNIX 95 iconv() function family, which translates string between various encoding systems.

I gave it a go:

# ipkg install gconv-modules
Installing gconv-modules (2.2.5-5) to root...
Downloading http://ipkg.nslu2-linux.org/feeds/unslung/cross/gconv-modules_2.2.5-5_armeb.ipk
Configuring gconv-modules
#

...and tried running the command again:

# ./s3sync.rb -r . <my key ID>.Test:adifferentprefix
S3 command failed:
list_bucket <my key ID>.TEST max-keys 200 prefix adifferentprefix/. delimiter /
With result 403 Forbidden
S3 ERROR: #<Net::HTTPForbidden:0x4065bf04>
./s3sync.rb:249:in `+': can't convert nil into Array (TypeError)
        from ./s3sync.rb:249:in `s3TreeRecurse'
        from ./s3sync.rb:293:in `main'
        from ./thread_generator.rb:79:in `call'
        from ./thread_generator.rb:79:in `initialize'
        from ./thread_generator.rb:76:in `new'
        from ./thread_generator.rb:76:in `initialize'
        from ./s3sync.rb:226:in `new'
        from ./s3sync.rb:226:in `main'
        from ./s3sync.rb:631
#

This was fantastic news! Although it had not synced, it had clearly contacted S3, and had been refused access -- so the charset problem was, it appeared, solved.

Now, back when I tried to get s3sync to work on my Ubuntu box, I'd discovered that it would refuse to sync when the machine's local time was skewed from the S3 server's time. I'd foolishly forgotten to check the slug's time before trying this sync, so before trying anything else I decided to check that it was OK:

# date
Tue Nov 14 02:49:31 GMT 2006

D'oh. It was 11:43pm on Monday 13 November when I typed that. So, I fixed it and tried again:

# date 11132344
Mon Nov 13 23:44:00 GMT 2006
# ./s3sync.rb -r . <my key ID>.Test:adifferentprefix
#

Which looked pretty good. I checked the S3 server, using the jets3t Cockpit tool that I'd used before, and lo and behold -- the files were there!

So, I now had successfully used s3sync to synchronise a directory from my NSLU2 up to Amazon S3 -- which was the main point of this project. While there was still a little bit of work to do -- for example, making sure it worked with reasonably deep directory hierarchies, checking user/group ownership and permissions were persisted, setting up encryption, and setting up a cron job to automate the backup -- the most tricky and experimental part of the work was done.

Next: File attributes, and deep directory hierarchies.