Project: Automated offsite backups for an NSLU2 -- part 10

Posted on 14 November 2006 in NSLU2 offsite backup project

Previously in this series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9.

I'm setting up automated offsite backups from my NSLU2 to Amazon S3. With suprisingly little effort, I've managed to get a tool called s3sync running on the "slug" (as it's known). s3sync is a Ruby script, so in order to run it, I had to install Ruby, which in turn meant that I had to replace the slug's firmware with a different version of Linux, called Unslung. All of this worked pretty much as advertised in the tools' respective documentation -- for the details, see the previous posts in this series.

As all of the pieces were in place, I next needed to do some simple tests to make sure it could handle the kind of files I wanted it to back up. In particular, I wanted it to be able to handle deep directory hierarchies, and to remember user and group ownership and file permissions.

The first step was to create some test files.

# cd /tmp
# mkdir testdata
# cd testdata
# mkdir directorynumber1
# cd directorynumber1
# mkdir directorynumber2
# cd directorynumber2

...

# cd directorynumber21
# pwd
/tmp/testdata/directorynumber1/directorynumber2/directorynumber3/directorynumber4/directorynumber5/directorynumber6/directorynumber7/directorynumber8/directorynumber9/directorynumber10/directorynumber11/directorynumber12/directorynumber13/directorynumber14/directorynumber15/directorynumber16/directorynumber17/directorynumber18/directorynumber19/directorynumber20/directorynumber21
# cat > file000
000
# chmod 000 file000
# cat > file644
644
# chmod 644 file644
# cat > file777
777
# chmod 777 file777
# chown guest:nobody file777
# chown bin:administrators file000
# ls -lrt
----------    1 bin      administ        4 Nov 14  2006 file000
-rw-r--r--    1 root     root            4 Nov 14  2006 file644
-rwxrwxrwx    1 guest    nobody          4 Nov 14  2006 file777
#

So, I had some files with differing permissions and ownership, at the bottom of a directory hierarchy with over 350 characters in it -- I had a vague impression that there might be a 200-character key limit on S3, and I'm always worried about 255-character limits, so 350 seemed like a sensible test length; if a system can manage 350, it can probably manage much larger figures, up to 32,767 or so... Anyway, the next step was to sync the whole thing up to S3:

# cd /tmp/s3sync/
# ./s3sync.rb -r /tmp/testdata <my key ID>.Test:yetanotherprefix
#

A quick check with jets3t Cockpit confirmed that everything was uploaded with appropriate-looking keys, and also with properties specifying decent-looking integer owner, group and permission values. This looked good -- no key-length limit issues. However, there was only one way to be absolutely sure that it was working:

# ./s3sync.rb -r &ltmy key ID>.Test:yetanotherprefix/testdata/ /tmp/copytestdata
#

(Note the positions of the slashes, etc. -- the full syntax for s3sync can take a while to work out, but the README documents it well if you take the time to read it...)

And then, to confirm that it's OK:

# cd /tmp/copytestdata/directorynumber1/directorynumber2/directorynumber3/directorynumber4/directorynumber5/directorynumber6/directorynumber7/directorynumber8/directorynumber9/directorynumber10/directorynumber11/directorynumber12/directorynumber13/directorynumber14/directorynumber15/directorynumber16/directorynumber17/directorynumber18/directorynumber19/directorynumber20/directorynumber21/
# ls -lrt
-rw-r--r--    1 root     root            4 Nov 14 01:03 file644
----------    1 bin      administ        4 Nov 14 01:03 file000
-rwxrwxrwx    1 guest    nobody          4 Nov 14 01:03 file777
#

...which all looked correct!

So now I knew that s3sync would work from the NSLU2 to Amazon S3, that the file attributes I cared about were being persisted, and that deep directory hierarchies were not a problem. The next step would have to be to get it working with full SSL, as I don't really want my private data flying over the public Internet unencrypted, and then to put the whole thing into a shell script and schedule a cron job to sync daily.

Next: SSL, and scheduling part 1.